DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
237 stars 73 forks source link

Centrifuge-build out of memory for small index? (fasta ~40 GB) #134

Open Greblica opened 6 years ago

Greblica commented 6 years ago

Hi, I've been trying to build a refseq index including prokaryotes, protozoa, plants, invertebrates and organelle genomes. I end up with cca 40 000 sequences and the file size is around 40 GB. I used dustmasker for low complexity regions (the same way as in centrifuge-download script).

I have 256 GB memory and 60 cores. However, the process always get killed. At the beginning of the calculations, I get: ... Using parameters --bmax 1006632960 --dcv 1024 Doing ahead-of-time memory usage test Passed! Constructing with these parameters: --bmax 1006632960 --dcv 1024 ...

So it seems to me that it should work, but it doesn't. I tried to play around a bit with bmax (I must admit I don't fully understand how that works though), but the result is always the same.

Am I missing something? The memory requirements shouldn't be that high, or? Thanks for any suggestions and help.

mourisl commented 6 years ago

How many threads are you using? Maybe you can try use fewer threads.

Greblica commented 6 years ago

cca 50 ( I tried with different settings). Could you maybe give me some explanation as to why that could help? Or a reference where I could get one? Thanks a lot!

harisankarsadasivan commented 4 years ago

@mourisl @Greblica Any solutions?