akvo / akvo-lumen

Make sense of your data
https://akvo.org/akvo-lumen
GNU Affero General Public License v3.0
63 stars 18 forks source link

Random derive id column in merge-dataset transformation with data-groups #3145

Closed tangrammer closed 3 years ago

tangrammer commented 3 years ago

Context

Working in data migration i found this line

https://github.com/akvo/akvo-lumen/blob/master/backend/src/akvo/lumen/lib/transformation/merge_datasets.clj#L171-L180

        columns-by-group (->> (set (get source "mergeColumns"))
                              (map-indexed (fn [i column-name]
                                             (let [dg (engine/datagroup-by-column data-groups column-name)
                                                   column (first (filter #(= (get % "columnName") column-name) (:columns dg)))]
                                               (-> column
                                                   (assoc "sourceColumnName" (get column "columnName")
                                                          "columnName" (engine/derivation-column-name
                                                                        (+ (engine/next-column-index target-dataset-columns) i)))
                                                   reset-column-values))))
                              (group-by #(get % "groupId")))

and due to this set call in (set (get source "mergeColumns") we are loosing payload order and getting derived ids based on this random collection order

Solution or next step

remove this set call that seems to be unnecessary

tangrammer commented 3 years ago

BTW: this bug will make visualisations fails thus referenced derived columns are going to change after an update