Open karlnyr opened 1 year ago
What behavior do we want?
Keep the original and include it in the database. I don't mind it being a force flag really - I believe that this situation only happens for manual stuff - so a force might be useful :)
Intuitively I would think that there should not be any files present in the housekeeper directories if they have not been added through the API.
Can we clarify the manual stuff this happens with? @karlnyr
moving description over from a duplicated issue: Description housekeeper add file fails, stating that the file already exists. However, when the specific bundle is retrieved with housekeeper get bundle it is shown to be empty. When looking at the bundle directory, the file is present in a version - so it should be listed for the bundle. It cannot be retrieved with housekeeper get file either.
The command below was run in the /home/proj/production/housekeeper-bundles/ADM1091A3/2018-06-05 directory:
for f in ; do housekeeper add file -t fastq -t H9GA6ADXX -b ADM1091A3 ./${f}; done
2023-06-13 09:47:51 hasta.scilifelab.se housekeeper.cli.core[37109] INFO Use root path /home/proj/production/housekeeper-bundles
2023-06-13 09:47:51 hasta.scilifelab.se housekeeper.cli.add[37109] INFO Running add file
2023-06-13 09:47:51 hasta.scilifelab.se housekeeper.store.api.handlers.read[37109] INFO Fetching bundle with name: ADM1091A3
Traceback (most recent call last):
File "/home/proj/production/bin/miniconda3/envs/P_main/bin/housekeeper", line 8, in
Though I agree that ideally we should avoid manually modifying the database, I have also found this issue and wondered if having a --force
or --skip-hard-linking
flag would be useful when doing manual work.
The problem arose when manually processing old flow cells stored on disk but with missing files in the housekeeper bundle or filtering vcf files from balsamic cases that fail for having too many variants as it is often more straightforward to find the necessary input files already in the housekeeper bundle. I found a workaround by moving or generating the files on a different directory and then adding them to housekeeper.
Suggested solution: Before hard linking the file, check if there is a file present. If so use Path.samefile() to compare them. If True, only add to the Database. This might be a bit cumbersome but avoids the problem of overwriting anything already in the bundle directory.
I attempted to add a file to a bundle but the file already existed in the path. A user should be able to add the file if it already exists within the bundle path.
For example
file_1
onbundle_1
which has the root of/home/housekeeper-bundles
and a version from June 2nd, 2023:When trying to add the file to the already included bundle - should it not just add the file link into the database?