GATB / bcalm

compacted de Bruijn graph construction in low memory
MIT License
97 stars 20 forks source link

Error: libc++abi.dylib: terminating with uncaught exception of type std::out_of_range: basic_string #55

Open karel-brinda opened 4 years ago

karel-brinda commented 4 years ago
BCALM 2, git commit c8ac60252fa0b2abf511f7363cff7c4342dac2ee                                                                                                               
setting storage type to hdf5                                                                                                                                               
[Approximating frequencies of minimizers ]  100  %   elapsed:   0 min 10 sec   remaining:   0 min 0  sec   cpu:  99.8 %   mem: [6721, 6721,    0] MB                       
[DSK: Collecting stats on hg38           ]  100  %   elapsed:   0 min 14 sec   remaining:   0 min 0  sec   cpu:  99.9 %   mem: [1049, 1107,    0] MB                       
[DSK: nb solid kmers found : 2503985560  ]  100  %   elapsed:   9 min 8  sec   remaining:   0 min 0  sec   cpu:  94.2 %   mem: [1584, 8561,    0] MB                       
bcalm_algo params, prefix:hg38/hg38.bc31.fa.unitigs.fa k:31 a:1 minsize:10 threads:1 mintype:1                                                                             
DSK used 1 passes and 3 partitions                                                                                                                                         
prior to queues allocation                      15:44:10     memory [current, maxRSS]: [1593,    0] MB                                                                     
Starting BCALM2                                 15:44:10     memory [current, maxRSS]: [1593,    0] MB                                                                     
[Iterating DSK partitions                ]  0    %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec                                                                   
Iterated 887340388 kmers, among them 143140847 were doubled                          

In this superbucket (containing 242671 active minimizers),                                                                                                                                   sum of time spent in lambda's: 1475479.9 msecs                     
                                 longest lambda: 610.8 msecs                                                                                                               
         tot time of best scheduling of lambdas: 1475479.9 msecs                                                                                                           
                       best theoretical speedup: 2415.7x                                                                                                                   
Done with partition 0                           16:18:30     memory [current, maxRSS]: [25559,    0] MB                                                                    

Iterated 903788377 kmers, among them 123275495 were doubled                          
Loaded 40462618 doubled kmers for partition 1

In this superbucket (containing 59110 active minimizers),                                                                                                          [0/1873]
                  sum of time spent in lambda's: 1558374.0 msecs                                                                                                           
                                 longest lambda: 1442.2 msecs                                                                                                              
         tot time of best scheduling of lambdas: 1558374.0 msecs                                                                                                           
                       best theoretical speedup: 1080.6x                                                                                                                   
Done with partition 1                           16:55:08     memory [current, maxRSS]: [15037,    0] MB                                                                    
[Iterating DSK partitions                ]  33.3 %   elapsed:  70 min 58 sec   remaining: 141 min 55 sec                                                                   
Iterated 712856795 kmers, among them 126149961 were doubled                                                                                                                
Loaded 70376914 doubled kmers for partition 2                                                                                                                              

In this superbucket (containing 198005 active minimizers),                                                                                                                 
                  sum of time spent in lambda's: 2675057.4 msecs                                                                                                           
                                 longest lambda: 1654.8 msecs                                                                                                              
         tot time of best scheduling of lambdas: 2675057.4 msecs                                                                                                           
                       best theoretical speedup: 1616.5x                                                                                                                   
Done with partition 2                           17:47:00     memory [current, maxRSS]: [21140,    0] MB                                                                    
[Iterating DSK partitions                ]  100  %   elapsed: 122 min 50 sec   remaining:   0 min 0  sec                                                                   
Number of sequences in glue: 431789853                                                                                                                                     
Number of pre-tips removed : 0                                                                                                                                             
Buckets compaction and gluing           : 7369.8 secs                                                                                                                      
Within that,                                                                                                                                                               
                                 creating buckets from superbuckets: 1659.4 secs                                                                                           
                      bucket compaction (wall-clock during threads): 5710.3 secs                                                                                           

                within all bucket compaction threads,                                                                                                                      
                       adding nodes to subgraphs: 1570.5 secs                                                                                                              
         subgraphs constructions and compactions: 1722.2 secs                                                                                                              
                  compacted nodes redistribution: 2416.1 secs                                                                                                              
Sum of CPU times for bucket compactions: 7368.2 secs                                 
Discrepancy between sum of fine-grained timings and total wallclock of buckets compactions step: 1.5 secs                                                                  
BCALM total wallclock (excl kmer counting): 7369.9 secs                                                                                                                    
Maximum number of kmers in a subgraph: 62422                                         
Performance of compaction step:                                                                                                                                            

                 Wallclock time spent in parallel section : 5710.3 secs                                                                                                    
             Best theoretical speedup in parallel section : 1676.8x                                                                                                        
Best theor. speedup in parallel section using 1 threads : 1.0x                                                                                                             
             Sum of longest bucket compaction for each sb : 3.7 secs                                                                                                       
                       Sum of best scheduling for each sb : 5708.9 secs                                                                                                    
Done with all compactions                       17:47:00     memory [current, maxRSS]: [21131,    0] MB                                                                    
bglue_algo params, prefix:hg38/hg38.bc31.fa.unitigs.fa k:31 threads:1                                                                                                      
Starting bglue with 1 threads                   17:47:02     memory [current, maxRSS]: [  88,    0] MB                                                                     
number of sequences to be glued: 431789853        17:47:02     memory [current, maxRSS]: [  88,    0] MB                                                                   
libc++abi.dylib: terminating with uncaught exception of type std::out_of_range: basic_string                                                                               
/bin/bash: line 1: 18662 Abort trap: 6           bcalm -in "hg38/hg38.fna" -out "hg38/hg38.bc31.fa" -kmer-size "31" -nb-cores "1" -abundance-min 1    
rchikhi commented 4 years ago

Hi Karel, Thanks for the detailed bug report. I tried on my machine, and it worked, but required around 40GB of ram. Does you machine had enough? If so, could you perhaps try on another machine (possibly Linux) to see if the bug occurs there? Best, Rayan

karel-brinda commented 4 years ago

Then I probably ran out of memory. Unfortunately, the exception (std::out_of_range: basic_string) is not very informative and intuitive in such a case.

rchikhi commented 4 years ago

yes, clearly not. Are you still stuck with this problem or..?

karel-brinda commented 4 years ago

I'm fine, I can use a cluster.

rchikhi commented 4 years ago

okay, just let me know if you ever get this again. On the other hand, 40GB to compact a human genome seems a bit high.. I might have to revisit that later.