gmarcais / Jellyfish

A fast multi-threaded k-mer counter
Other
460 stars 136 forks source link

out of memory? #198

Open biofcallejas opened 1 year ago

biofcallejas commented 1 year ago

Hi, I'm using jellyfish/2.3.0 with a ~160G fastq (pacbio DNAseq reads) file as follows:

jellyfish count -C -m 21 -s 280G -t 10 file.fastq -o reads.jf

The first time I got this error:

terminate called after throwing an instance of 'jellyfish::large_hash::array_base<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, jellyfish::large_hash::unbounded_array<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, allocators::mmap> >::ErrorAllocation'
  what():  Failed to allocate 256000000000 bytes of memory

I increase the memory to 280G, but now I got this error:

terminate called after throwing an instance of 'jellyfish::large_hash::array_base<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, jellyfish::large_hash::unbounded_array<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, allocators::mmap> >::ErrorAllocation'
  what():  Failed to allocate 640000000000 bytes of memory

Am I missing something? is this normal? Any advice?

LYC-vio commented 6 months ago

Hi, maybe you have set a too large -s. -s is not the size estimation of the input fastq/fasta file nor the amount of RAM, instead it means the number of slots in your hash table, or in other words, the estimation of how many possible types of kmers you will encounter in that file. For reads from human genome, -s 3G should be enough (as far as my experience) , or you can check the memory requirement here