TCP-Lab / x.FASTQ

Bash wrapper modules for the remote analysis of RNA-Seq data, with persistency features.
MIT License
2 stars 0 forks source link

Add support for checking download hashes #21

Closed MrHedmad closed 7 months ago

MrHedmad commented 7 months ago

ENA gives the ability to check the MD5 checksums live after download, and try the download again if in fails.

The logic to spawn download processes and to check logfiles had to be overhauled.

Feat-FeAR commented 7 months ago

I refactored most of the module. Under the hood, it performs the same processes you @MrHedmad introduced in this branch, with (almost) the same logic, but with some differences in the code, the main of which being that:

  1. I reimplemented nohup command so that it can also apply to functions;
  2. This allowed me to remove code duplication and made the whole code more readable and maintainable;
  3. I removed eval and replaced it with a safer bash -c;
  4. I introduced a flag (--no-checksum) to possibly skip checksum (i.e., in the very very very unlikely case in which ENA MD5 hash is wrong, corrupt or not available, FASTQ can now be downloaded anyway).

Notably, getFASTQ now auto-detects possible incomplete downloads from previous runs and overwrite them! For example, when you kill a running download (or the internet connection goes down) the partial FASTQ file is not removed. With the old getFASTQ, if you forget to remove the stumps, any subsequent run of getFASTQ skipped it as 'already there' file. Now, the new getFASTQ checks their MD5 and since the checksum fails the stumps are removed and files re-downloaded.

Amazing bonus unexpected feature!