bcgsc / tigmint

⛓ Correct misassemblies using linked AND long reads
https://bcgsc.github.io/tigmint/
GNU General Public License v3.0
54 stars 13 forks source link

pigz may be better replaced by bgzip #62

Closed mmokrejs closed 2 years ago

mmokrejs commented 2 years ago

pigz may be replaced by bgzip from htslib package from http://www.htslib.org which scales better

https://github.com/samtools/samtools/issues/1318#issuecomment-703483014

lcoombe commented 2 years ago

Hi @mmokrejs,

Thanks for the suggestion. Currently, tigmint in the linked read mode actually doesn't use any command-line zipping/unzipping, and tigmint-long only does in the tigmint_estimate_dist.py step, which is pretty fast already. It can be hard to distinguish the rules in the makefile that will be executed, so I always recommend using -n as the added option to your Tigmint command to see what command will actually be run if you're unsure.

On the two human datasets that I just tested, the tigmint_estimate_dist.py step takes <4min, so luckily it's a pretty quick step already. I see this step is always using gunzip -c currently, so thank you for pointing out that we should be using $(gzip) -dc there. However, in my two tests, I found that pigz -dc was faster than bgzip -dc.

If you have benchmarks that suggest the opposite, I'm happy to re-assess.

Thanks, Lauren

mmokrejs commented 2 years ago

OK, maybe for the decompression this does not really matter, the big differences are when compressing the input.

I haven't checked you you somewhere compress the results back or not, although I thought so.