DominikBuchner / BOLDigger-commandline

BOLDigger as a commandline tool
MIT License
8 stars 0 forks source link

boldigger delete input data? #2

Closed FabianRoger closed 3 years ago

FabianRoger commented 3 years ago

Hi,

it seems like boldigger-clinewould manipulate and delete the input fasta file. After running the program, I cannot find the input file anymore and it seems to have been replaced with input_done.fasta.

If so, I strongly suggest to change the code to make a temporary copy of the file and work on the copy but not modify (even only rename) the raw data. It's not a problem in my case as i can rather easily re-genrate the file but this might not always be the case.

Thanks for the useful tool, it seems to work well otherwise!

Fabian

DominikBuchner commented 3 years ago

It does not delete the input file, only renames it / moves sequences. The reason is explained in the documentation: In case of a crash you don't need to manipulate the files yourself, since you can simply resume the download with the old files where the sequences where moved from. It only does move the sequences AFTER saving the result AND saving the already queried sequences to the new file.

FabianRoger commented 3 years ago

Thanks for the explanation!

However, after it is done, it replaces the original file with (a hopefully identical) file renamed original_file_done.fasta. The original file (under it's original filename) is deleted after the program finished running. I don't think this should happen.

If you have a pipeline that sends your fasta file to boldigger but also uses it for some other program, that program won't find the file after boldigger ran. Also it's scary if the original file is gone and I have no means of checking if the new file is identical.

It's only a suggestion but I think it's important to leave the original file untouched.

DominikBuchner commented 3 years ago

Yes I understand the issue but don't see an easy solution to it from BOLDiggers side. What about adjusting the pipeline in a way that it just sends a copy of the file to boldigger that can be handled the way it works at the moment? The crashing issue happens a lot depending on internet connection, file size, sequence size and is far more annoying in my opinion.

FabianRoger commented 3 years ago

I understand. Why not just make a copy at the very beginning and work on that? If you resume you can check if the copy already exists and then continue were you stopped, or?

DominikBuchner commented 3 years ago

Yes it is possible. I'll see when I have time to work on that.