Open seallard opened 9 months ago
This is a general bug affecting all bundles in housekeeper. The version name (which needs to be unique for a bundle) uses a date and if the bundle is created the same day, there will be a clash.
We are seeing this for microsalt cases since they are re-run the same day more commonly.
A first thought: version numbers would be better because we can easily identify which one is the most recent one
A question on this is also: Would we like to keep all the files for all the runs? For example in microSALT we often correct reference organisms for some samples and rerun the analysis with the exact same data. In this case we are only interested in keeping the results of the corrected run and replace the files in housekeeper so there is no need to keep both version. We might want to leave the decision to delete the previous version to the user though.
Check with production. Alternatives:
Production (represented by @karlnyr): Use a version number.
Leave as is, would need to patch scout as well.
housekeeper add bundle
and add version
When storing available cases, ensure to patch the logic so that we do not duplicate data if for whatever reason we attempt to store the same analysis twice.
Closing due to inactivity. Reopen and answer the question below if you want this prioritised.
Concerning the proposed feature:
Bug
A bundle version consists of raw data (?) and data from an analysis of a case.
Currently, a version for a bundle is identified by a date in the paths to files in it. So if you try to create a new version for a bundle on the same day another version was created, nothing happens. The underlying assumption, that we only run analyses for cases on separate days, does not hold.
Steps to reproduce
housekeeper add bundle sadicebear
housekeeper add version sadicebear
housekeeper add version sadicebear
Only one version exists after running these commands.
Suggested fix
Scheduled for technical refinement.
We could use a version number for the versions instead which resets for each bundle
Or we could use a GUID (global unique identifier) for versions
Or any other naming pattern which uniquely identifies a version for a bundle.
Notes
The bug was noticed by @beatrizsavinhas when restarting a microsalt analysis.
Microsalt cases that have been analyzed before fails to store (and report QC and upload) because there are already files in the bundle from the previous analysis the same day. This requires manual intervention, deleting the old bundle in housekeeper.