Closed mmokrejs closed 2 years ago
Hi @mmokrejs,
Thanks for the suggestion. Currently, tigmint
in the linked read mode actually doesn't use any command-line zipping/unzipping, and tigmint-long
only does in the tigmint_estimate_dist.py
step, which is pretty fast already. It can be hard to distinguish the rules in the makefile that will be executed, so I always recommend using -n
as the added option to your Tigmint command to see what command will actually be run if you're unsure.
On the two human datasets that I just tested, the tigmint_estimate_dist.py
step takes <4min, so luckily it's a pretty quick step already. I see this step is always using gunzip -c
currently, so thank you for pointing out that we should be using $(gzip) -dc
there. However, in my two tests, I found that pigz -dc
was faster than bgzip -dc
.
If you have benchmarks that suggest the opposite, I'm happy to re-assess.
Thanks, Lauren
OK, maybe for the decompression this does not really matter, the big differences are when compressing the input.
I haven't checked you you somewhere compress the results back or not, although I thought so.
pigz
may be replaced bybgzip
fromhtslib
package from http://www.htslib.org which scales betterhttps://github.com/samtools/samtools/issues/1318#issuecomment-703483014