NBISweden / aMeta

Ancient microbiome snakemake workflow
MIT License
19 stars 15 forks source link

Lower memory usage in kraken2krona #87

Closed clami66 closed 1 year ago

clami66 commented 1 year ago

As mentioned in #86 , sometimes kraken2krona must be run on a full node since it might run out of memory. Being a single-core job, this wastes resources and causes longer queue times.

The issue seems to be that the filtering step krakenuniq2krona.R loads the full sequences.krakenuniq in memory, and from what I see this file can be several GB large. So I have rewritten the script to read and process the file line by line instead (in python because I am more confident with that)

I don't know if the following command (ktImportTaxonomy) also uses a lot of ram, and we can't do much about that, but the filtering step should be the most memory-intensive