Closed npb596 closed 4 months ago
Hi Nick,
The behavior you describe is likely a result of Genozip consuming too much memory (some to the "--best" methods are memory-hungry). Some of the memory consumption is linear with the number of threads, so as you observed, reducing their number helps. Some of the memory consumption is linear with the "vblock" size - reducing that with --vblock may also help, at the expense of the compression ratio (you can see the current vblock value with genozip or genocat --stats).
This is just a general comment, it is hard to say more without taking a closer look at your file.
-divon
Hello Divon,
Thanks for writing this software, I've used it for a while now and appreciate the easy-of-use and clear documentation.
I'm getting odd behavior when using --best to compress a VCF and I'm not certain if it's due to limitations of my hardware or something inherent to software.
When running a simple command like this the behavior seems largely dependent on number of threads that I assign.
genozip -@ ${num_threads} --best -t -Q AB0103-C.vcf.gz -o AB0103-C.vcf.genozip
If I assign 1, it appears to work well enough and informs me that it will take about 40-50 minutes (about what I expect compared to not using --best on these files). If I assign anywhere from 2-6 threads it will basically tell me the job will take 0 seconds and hang on 0% until my computer freezes up. Anything from 7-12 threads it will progress to about 1-4% and kill the job, with no reason given for why. I am running this on a new HP laptop with a linux OS, 16GB RAM, and 12 cores. The gzipped VCF is itself about 3.3GB. I can provide more specific specs if needed. For the time being, I can just run without --best (ironically a tip comes up about using --best) but I am curious what is going on here.
Cheers, Nick