fbreitwieser / krakenuniq

🐙 KrakenUniq: Metagenomics classifier with unique k-mer counting for more specific results
GNU General Public License v3.0
217 stars 43 forks source link

out of memory with --preload-size 80GB at 128GB #171

Open JochenSchaefergmxde opened 1 month ago

JochenSchaefergmxde commented 1 month ago

[3509557.600364] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-4362.scope,task=classifyExact,pid=889353,uid=1000 [3509557.600388] Out of memory: Killed process 889353 (classifyExact) total-vm:851435484kB, anon-rss:118022716kB, file-rss:5888kB, shmem-rss:0kB, UID:1000 pgtables:1620464kB oom_score_adj:0 [3509564.767599] oom_reaper: reaped process 889353 (classifyExact), now anon-rss:596kB, file-rss:3680kB, shmem-rss:0kB

116GB reported bevor free and it was the only one

salzberg commented 1 month ago

Not clear exactly what you were doing here, but the --preload-size option creates temporary files each time through the DB, and it might have run out of space because of those. Suggest you set --preload-size much smaller, perhaps 20GB, and try again. Then maybe 40GB and see if that works/runs faster.

JochenSchaefergmxde commented 1 month ago

next Try 50GB (base) internet@linux:/mnt/sdc1/jp$ krakenuniq --db /mnt/m2/kuniqdb/kuniq_standard_plus_eupath_minus_kdb --threads 32 --preload-size 50gb --exact --output /mnt/sdc1/jp/2024_05_16_09_08_44_jp_kuniq_stanTSAA1877t2tu_O.txt --report-file /mnt/sdc1/jp/2024_05_16_09_08_44_jp_kuniq_stanTSAA1877t2tu_R.txt /mnt/fastq/jp_TSAA1877_t2t_u_s.fastq.gz Warning: Overwriting /mnt/sdc1/jp/2024_05_16_09_08_44_jp_kuniq_stanTSAA1877t2tu_R.txt. /usr/local/bin/classifyExact -d /mnt/m2/kuniqdb/kuniq_standard_plus_eupath_minus_kdb/database.kdb -i /mnt/m2/kuniqdb/kuniq_standard_plus_eupath_minus_kdb/database.idx -t 32 -o /mnt/sdc1/jp/2024_05_16_09_08_44_jp_kuniq_stanTSAA1877t2tu_O.txt -x 50gb -r /mnt/sdc1/jp/2024_05_16_09_08_44_jp_kuniq_stanTSAA1877t2tu_R.txt -a /mnt/m2/kuniqdb/kuniq_standard_plus_eupath_minus_kdb/taxDB -p 12 Database /mnt/m2/kuniqdb/kuniq_standard_plus_eupath_minus_kdb/database.kdb Loaded database with 47859577226 keys with k of 31 [val_len 4, key_len 8]. Reading taxonomy index from /mnt/m2/kuniqdb/kuniq_standard_plus_eupath_minus_kdb/taxDB. Done. Writing Kraken output to /mnt/sdc1/jp/2024_05_16_09_08_44_jp_kuniq_stanTSAA1877t2tu_O.txt Processed 36361273 sequences (database chunk 11 of 11) 36361273 sequences (5476.23 Mbp) processed in 3866.776s (564.2 Kseq/m, 84.97 Mbp/m). 30813893 sequences classified (84.74%) 5547380 sequences unclassified (15.26%) Writing report file to /mnt/sdc1/jp/2024_05_16_09_08_44_jp_kuniq_stanTSAA1877t2tu_R.txt .. Reading genome sizes from /mnt/m2/kuniqdb/kuniq_standard_plus_eupath_minus_kdb/database.kdb.counts ... done Setting values in the taxonomy tree ...(base)

[3521083.423618] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-4362.scope,task=classifyExact,pid=889644,uid=1000 [3521083.423671] Out of memory: Killed process 889644 (classifyExact) total-vm:842547888kB, anon-rss:120279440kB, file-rss:4608kB, shmem-rss:0kB, UID:1000 pgtables:1624760kB oom_score_adj:0 [3521090.043587] oom_reaper: reaped process 889644 (classifyExact), now anon-rss:1140kB, file-rss:2540kB, shmem-rss:0kB

(base) internet@linux:/mnt/sdc1/jp$ free -h gesamt benutzt frei gemns. Puffer/Cache verfügbar Speicher: 125Gi 7,3Gi 116Gi 80Ki 2,7Gi 118Gi Auslager: 136Gi 2,4Gi 134Gi

Now i try your recomodation.

salzberg commented 1 month ago

not sure (because I'm not looking at the code from where I am) but you used "50gb" and you might have to type it was "50G" or "50GB" to get krakenuniq to recognize the memory size.

JochenSchaefergmxde commented 1 month ago

without -exact it is working with preload-size 50G . It seems it must be a Problem with the exact switch. Perhaps my $TMP is at a NTFS-Drive perhaps it is another handling from the write-access, because $TMP at NTFS is not the best idee, but i cant change it in the moment.

salzberg commented 1 month ago

Oh, I didn't notice that before - never use the "-exact" option please! We should get rid of it. It slows down KrakenUniq a lot, and we only added that feature because a reviewer insisted. But we never use it ourselves, haven't tested it very thoroughly, and we should just get rid of it. The approximate k-mer counting is extremely accurate and has never caused a problem.