Closed William-N-Havard closed 1 year ago
The merge overwrites the previously generated set files without needing an 'overwrite' argument and without any warning. It could probably cause problems when not careful. We could enforce that the output_set must not already exist. Is it possible that a set is constituted of multiple different merges (sounds to me like they should each have a separate set) ? In this case, rerunning your merge would require that you remove the previously generated set before doing so (or we can add a 'replace-set' argument to the merging function to perform the removal prior to merging.
What do you think?
Yes, I think the best would be to raise an error if the set already exists so that the user first deletes it and re-merges it. Another problem with this set merging is that the resulting set can become outdated if one adds new files to one set that was used for the merge
yeah, tracking down outdated sets will be kind of hard to do
When merging several sets together using
merge_sets
several times (see below), duplicate lines are created inannotations.csv
Duplicate lines in
annotations.csv
when runningmerge_sets
twiceThese lines should be dropped before (re)merging the sets and adding the resulting new annotation lines.