DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
235 stars 73 forks source link

Out of memory allocating the offs[] array for the Bowtie index. #220

Open jessicarowell opened 2 years ago

jessicarowell commented 2 years ago

I run the command below on an r5a.xlarge AWS EC2 instance (4 vCPUs, 32 GB RAM) on two small fastq files (about 5MB each). I've pasted the full error below the command. I'm using the hpvc reference index from your website. I tried this with -p 4 and -p 2 and I get the same error. I also see in the error that the -p option is not reproduced and I am not 100% sure why.

I can't find what Bowtie2 is being used for in the Centrifuge tool. Can you explain it? And do you have any estimates of how much memory Bowtie2 needs (I assume it scales with input size)?

centrifuge -q -t --met-file classify/metrics.txt -x $HOME/ref/centrifuge/hpvc -1 R1_001.fastq.gz -2 R2_001.fastq.gz --report-file classify/c_report.tsv -S classify/c_result.out

Out of memory allocating the offs[] array for the Bowtie index. Please try again on a computer with more memory. Time loading forward index: 00:01:36 Overall time: 00:03:31 Error: Encountered internal Centrifuge exception (#1) Command: /usr/local/bin/centrifuge-class --wrapper basic-0 -q -t --met-file classify/metrics.txt -x /home/ec2-user/ref/centrifuge/hpvc --report-file classify/c_report.tsv -S classify/c_result.out -1 /tmp/11668.inpipe1 -2 /tmp/11668.inpipe2 (ERR): centrifuge-class exited with value 1

Thank you!

jessicarowell commented 2 years ago

Is centrifuge no longer actively maintained?

mourisl commented 2 years ago

Sorry, I may miss the notification of this issue. Centrifuge utilized the FM index implementation from Bowtie2, so some of the error messages are inherited. The memory usually depends on the index file, and I think the HPVC database should be around 20GB. Can you check what the size of the HPVC files on your system and maybe try Centrifuge with more memory? Thanks.

jessicarowell commented 2 years ago

I see. Thank you! My expanded hpvc database is 74GB. I was able to successfully run Centrifuge on a set of paired-end reads (each gzipped fastq 2GB) from a metagenomics sample on an EC2 instance with 128GB RAM. I haven't tried anything less than that yet.

I appreciate your time! In August I had created another issue about the metrics file; I'm still very interested in being able to output that file so if you have any time to address that one it would be really great! Thank you again. The software is cool and I also appreciate your paper explaining it; it's very helpful.

~ Jessica

On Wed, Nov 10, 2021 at 7:55 PM Li Song @.***> wrote:

Sorry, I may miss the notification of this issue. Centrifuge utilized the FM index implementation from Bowtie2, so some of the error messages are inherited. The memory usually depends on the index file, and I think the HPVC database should be around 20GB. Can you check what the size of the HPVC files on your system and maybe try Centrifuge with more memory? Thanks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DaehwanKimLab/centrifuge/issues/220#issuecomment-965896206, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGBWOQXYDKXEFOHS73AUSMTULMIBXANCNFSM5EA6QBTQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Jessica Rowell

mourisl commented 2 years ago

Then you need more than 74GB memory to run Centrifuge and 128GB allocation is good.

I tried to fix the metrics file issue a few years back but could not find the bug. I'll check it again.