DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
686 stars 267 forks source link

Not enough RAM to classify reads with large custom database #686

Open PeterCx opened 1 year ago

PeterCx commented 1 year ago

I have built custom database which contains the entire GTDB as well as unique MAGs I generated ~ 67,000 genomes. The hash.k2d is 305Gb in size. When I try and classify reads kraken2 cannot even load the database:

kraken2 --db Custom_Database --threads 30 --gzip-compressed --output Output/Sample.output.txt --report Output/Sample.report.txt --report-zero-counts --paired Reads/Sample.R1.fastq Reads/Sample.R2.fastq Loading database information..........Killed.

kraken2 --version Kraken version 2.1.2

kraken2-inspect --skip-counts --db Custom_Database Loading database information..........Killed.

grep MemTotal /proc/meminfo MemTotal: 394816212 kB

This was the output from when I built the database:

Creating sequence ID to taxonomy ID map (step 1)... Sequence ID to taxonomy ID map complete. [2.070s] Estimating required capacity (step 2)... Estimated hash table requirement: 312273650832 bytes Capacity estimation complete. [11m56.324s] Building database files (step 3)... Taxonomy parsed and converted. CHT created with 18 bits reserved for taxid. Completed processing of 7746034 sequences, 202613488062 bp Writing data to disk... complete. Database files completed. [12h3m44.879s] Database construction complete. [Total: 12h15m47.164s]

I am presuming I do not have enough RAM to classify. It is unlikely I will be able to use a server with more RAM.

Is there any other way I can make this work?

Help is appreciated. Many thanks

b-math commented 1 year ago

Hi.

Did you try the --memory-mapping option?

Best regards Barbara