GATB / gatb-core

Core library of the Genome Analysis Toolbox with de-Bruijn graph
https://gatb.inria.fr/software/gatb-core/
63 stars 27 forks source link

Memory reservation in dbgh5 #27

Closed cguyomar closed 4 years ago

cguyomar commented 5 years ago

Hello all,

I encountered what might be a bug during a simple test of dbgh5 on my computer. (Ubuntu 18.04, 8GB of RAM, last commit of gatb-core)

When working on a small test file (1000 reads), dbgh5 crashes with this message :

[DSK: Collecting stats on reads_r1       ]  100  %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec   cpu:  50.0 %   mem: [  20,   20,   20] MB 
[DSK: Pass 1/1, Step 1: partitioning     ]  0    %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec   cpu:  -1.0 %   mem: [  46,   46,   75] MB 
EXCEPTION: Pool allocation failed for 80 bytes (kmers alloc), mainbuffer is null?. Current usage is 96 and capacity is 5242881152
Pool allocation failed for 8 bytes (kmers alloc), mainbuffer is null?. Current usage is 120 and capacity is 5242881152
Pool allocation failed for 72 bytes (kmers alloc), mainbuffer is null?. Current usage is 200 and capacity is 5242881152
Pool allocation failed for 0 bytes (kmers alloc), mainbuffer is null?. Current usage is 208 and capacity is 5242881152
Pool allocation failed for 8 bytes (kmers alloc), mainbuffer is null?. Current usage is 232 and capacity is 5242881152
Pool allocation failed for 72 bytes (kmers alloc), mainbuffer is null?. Current usage is 312 and capacity is 5242881152
Pool allocation failed for 8 bytes (kmers alloc), mainbuffer is null?. Current usage is 328 and capacity is 5242881152
Pool allocation failed for 16 bytes (kmers alloc), mainbuffer is null?. Current usage is 352 and capacity is 5242881152

The problem can be solved by :

It seems to me that having 5GB of memory available should not be necessary for such a tiny example. Is this a bug or the expected behavior of gatb?

Regards,

Cervin

rchikhi commented 5 years ago

Thanks for the bug report. Unfortunately I can't reproduce it. Can you reproduce it on a different file, or just this one? Regardless, can you please give me the input file? (e.g. https://transfer.sh)

I tried the following (on my machine and genocluster):

$ bin/dbgh5 -in ../test/db/microsnp.fa -kmer-size 12
[DSK: Collecting stats on microsnp       ]  100  %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec   cpu:  -1.0 %   mem: [  16,   16,   16] MB
[DSK: nb solid kmers found : 0           ]  100  %   elapsed:   0 min 2  sec   remaining:   0 min 0  sec   cpu: 111.1 %   mem: [ 223,  223,  223] MB B

EXCEPTION: This dataset has no solid kmers

and

seqtk seq ../test/db/reads3.fa.gz  | head -n 2000 > 1000kreads.fa
$ bin/dbgh5 -in 1000kreads.fa
[DSK: Collecting stats on 1000kreads     ]  100  %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec   cpu: 100.0 %   mem: [  28,   28,   28] MB
[DSK: Pass 1/1, Step 1: partitioning     ]  27.3 %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec   cpu: 175.0 %   mem: [  93,   93,   93] MB
[etc.. running fine]
cguyomar commented 5 years ago

I have exactly the same problem with the microsnp dataset

$ bin/dbgh5 -in ../test/db/microsnp.fa -kmer-size 12
[DSK: Collecting stats on microsnp       ]  100  %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec   cpu:  -1.0 %   mem: [  17,   17,   17] MB 
[DSK: Pass 1/1, Step 1: partitioning     ]  0    %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec   cpu:  -1.0 %   mem: [  46,   46,   75] MB 
EXCEPTION: Pool allocation failed for 0 bytes (kmers alloc), mainbuffer is null?. Current usage is 16 and capacity is 5242881152
Pool allocation failed for 0 bytes (kmers alloc), mainbuffer is null?. Current usage is 32 and capacity is 5242881152
Pool allocation failed for 0 bytes (kmers alloc), mainbuffer is null?. Current usage is 48 and capacity is 5242881152
Pool allocation failed for 0 bytes (kmers alloc), mainbuffer is null?. Current usage is 64 and capacity is 5242881152
Pool allocation failed for 0 bytes (kmers alloc), mainbuffer is null?. Current usage is 80 and capacity is 5242881152
Pool allocation failed for 0 bytes (kmers alloc), mainbuffer is null?. Current usage is 96 and capacity is 5242881152
$ bin/dbgh5 -in ../test/db/microsnp.fa -kmer-size 12 -max-memory 1000
[DSK: Collecting stats on microsnp       ]  100  %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec   cpu:   0.0 %   mem: [  17,   17,   17] MB 
[DSK: nb solid kmers found : 0           ]  100  %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec   cpu: 105.3 %   mem: [  73,   73,   75] MB B 

EXCEPTION: This dataset has no solid kmers
rchikhi commented 5 years ago

I see. Maybe the problem is limited to machines with 8GB of RAM. I'll keep this issue open and get back to it at some point.

rchikhi commented 4 years ago

Have recently committed a possibly related memory allocation fix: https://github.com/GATB/gatb-core/commit/e99c5d74146f0e7730b866a0b5a35d8c8780ceae