GaetanBenoitDev / metaMDBG

MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.
MIT License
105 stars 4 forks source link

feature request: "resume" function #16

Open biorover opened 3 months ago

biorover commented 3 months ago

No idea if "resume" is possible at earlier stages as I'm not sure what is in memory/vs disk, but it should be relatively trivial once metamdbg gets to the "Polishing contigs" and "Purging strain duplication", stages, right? I currently have a run on a very very large dataset that has already used ca. 25,000 CPU hours and failed when a disk filled up (from other processes) at the "Purging strain duplication" stage... so close!!!

GaetanBenoitDev commented 3 months ago

It's not possible yet but it's a good idea to create checkpoints. Atm, you can run the final command manually:

metaMDBG derep outputDir/tmp//contigs_polished.fasta.gz outputDir//contigs.fasta.gz outputDir//tmp/

Replace "outputDir" and choose number of threads (e.g. -t 16 or --threads 16 depending on your version).

biorover commented 3 months ago

Thanks Gaetan! That will work great for now, and I definitely support adding checkpoints (at least optionally) as a future feature