bowmanjeffs / paprica

paprica - PAthway PRediction by phylogenetIC plAcement
27 stars 9 forks source link

Error: HMM banded truncated alignment mxes need 1392.25 Mb > 1028.00 Mb limit #92

Closed Ql-cy closed 2 years ago

Ql-cy commented 2 years ago

Hi jeff: I tried to run paprica-run.sh but an error came out: Error: HMM banded truncated alignment mxes need 1392.25 Mb > 1028.00 Mb limit. It seems not related to memory of running files. I have added the mxsize command(./paprica-run.sh S0029 archaea --mxsize=3000)or (./paprica-run.sh S0029 archaea --mxsize 3000), but doesn't work. I will be so appreciate if you could give some suggestions.

bowmanjeffs commented 2 years ago

This is generally caused by low quality reads. Did you QC and denoise your reads?

Ql-cy commented 2 years ago

This is generally caused by low quality reads. Did you QC and denoise your reads?

The reads of 16S rRNA were extracted from metagenomic data and may have low quality. Thanks for pointing it out!

bowmanjeffs commented 2 years ago

Great. Just one bad read can cause this, so a straightforward screen for quality before you run paprica should solve it.

Ql-cy commented 2 years ago

Great. Just one bad read can cause this, so a straightforward screen for quality before you run paprica should solve it.

Hi jeff: Sorry to bother you again. I have screened and checked the quality of the reads by fastp. It seems all qualified, but the same error also came out when running paprica-run.sh. I wonder there may be other reasons?

bowmanjeffs commented 2 years ago

What parameters did you use for quality checking? I recommend taking just a small subset of your reads (say, 10) and seeing if that works. That will eliminate any weird general problems with your data. Bad reads, or reads that don’t fit the model, are the only cause I’ve seen for this error. The latter explanation should be fixed by the paprica-pick_domain.py step (I assume you have the most recent version of paprica installed?).

Jeff

=======================

Jeff Bowman

Assistant Professor

Scripps Institution of Oceanography

www.polarmicrobes.org

From: Ql-cy @.> Sent: Monday, April 18, 2022 6:55 PM To: bowmanjeffs/paprica @.> Cc: Jeff Bowman @.>; State change @.> Subject: Re: [bowmanjeffs/paprica] Error: HMM banded truncated alignment mxes need 1392.25 Mb > 1028.00 Mb limit (Issue #92)

Great. Just one bad read can cause this, so a straightforward screen for quality before you run paprica should solve it.

Hi jeff: Sorry to bother you again. I have screened and checked the quality of the reads by fastp. It seems all qualified, but the same error also came out when running paprica-run.sh. I wonder there may be other reasons?

— Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_bowmanjeffs_paprica_issues_92-23issuecomment-2D1101915713&d=DwMCaQ&c=-35OiAkTchMrZOngvJPOeA&r=Vjg2eq3hRD62sSDR5VtK8QEi_9WdukwYLjIk9ZWBH0o&m=R2Ozn9Qmb3nIT4LQv9qofKACh2lBQ1BVelvlyBx8G9PgsYuSvAA549Rbl33MccaB&s=933v5i9rxilA84TjX9w0ufSdMDbOYj6WFcGYhdL8nAw&e= , or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AA4JHVA7V6RSV5MAXKVQNSLVFYG6BANCNFSM5TVWHPSQ&d=DwMCaQ&c=-35OiAkTchMrZOngvJPOeA&r=Vjg2eq3hRD62sSDR5VtK8QEi_9WdukwYLjIk9ZWBH0o&m=R2Ozn9Qmb3nIT4LQv9qofKACh2lBQ1BVelvlyBx8G9PgsYuSvAA549Rbl33MccaB&s=DfzYYyBd7Htz9SxuHkAp1KnPD3QDcs5ouUZRR74_X24&e= . You are receiving this because you modified the open/close state. https://github.com/notifications/beacon/AA4JHVCXTASA6CIQ57WMFKTVFYG6BA5CNFSM5TVWHPS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOIGW6MQI.gif Message ID: @. @.> >

Ql-cy commented 2 years ago

What parameters did you use for quality checking? I recommend taking just a small subset of your reads (say, 10) and seeing if that works. That will eliminate any weird general problems with your data. Bad reads, or reads that don’t fit the model, are the only cause I’ve seen for this error. The latter explanation should be fixed by the paprica-pick_domain.py step (I assume you have the most recent version of paprica installed?). Jeff ======================= Jeff Bowman Assistant Professor Scripps Institution of Oceanography www.polarmicrobes.org From: Ql-cy @.> Sent: Monday, April 18, 2022 6:55 PM To: bowmanjeffs/paprica @.> Cc: Jeff Bowman @.>; State change @.> Subject: Re: [bowmanjeffs/paprica] Error: HMM banded truncated alignment mxes need 1392.25 Mb > 1028.00 Mb limit (Issue #92) Great. Just one bad read can cause this, so a straightforward screen for quality before you run paprica should solve it. Hi jeff: Sorry to bother you again. I have screened and checked the quality of the reads by fastp. It seems all qualified, but the same error also came out when running paprica-run.sh. I wonder there may be other reasons? — Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_bowmanjeffs_paprica_issues_92-23issuecomment-2D1101915713&d=DwMCaQ&c=-35OiAkTchMrZOngvJPOeA&r=Vjg2eq3hRD62sSDR5VtK8QEi_9WdukwYLjIk9ZWBH0o&m=R2Ozn9Qmb3nIT4LQv9qofKACh2lBQ1BVelvlyBx8G9PgsYuSvAA549Rbl33MccaB&s=933v5i9rxilA84TjX9w0ufSdMDbOYj6WFcGYhdL8nAw&e= , or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AA4JHVA7V6RSV5MAXKVQNSLVFYG6BANCNFSM5TVWHPSQ&d=DwMCaQ&c=-35OiAkTchMrZOngvJPOeA&r=Vjg2eq3hRD62sSDR5VtK8QEi_9WdukwYLjIk9ZWBH0o&m=R2Ozn9Qmb3nIT4LQv9qofKACh2lBQ1BVelvlyBx8G9PgsYuSvAA549Rbl33MccaB&s=DfzYYyBd7Htz9SxuHkAp1KnPD3QDcs5ouUZRR74_X24&e= . You are receiving this because you modified the open/close state. https://github.com/notifications/beacon/AA4JHVCXTASA6CIQ57WMFKTVFYG6BA5CNFSM5TVWHPS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOIGW6MQI.gif Message ID: @. @.> >

Thanks for your reply in time. But I was also confused about this error. The reads of all samples were checked with the same methods (fastp with default parameter or cut too long/short sequences). The quality report of all samples from fastqc showed qualified except Per tile sequence quality, but some of samples with Per tile sequence quality error can run the paprica-run.sh successfully. I performed the program in docker and may be the latest version. Maybe there some error reads in some of the samples. If you could give me some suggestions to clarify the error reads, I will be much appreciated. One of the error sample was atachched. S0029.txt

bowmanjeffs commented 2 years ago

The default parameters in fastp may simply not be enough, and you may need to try different parameters to find a combination that successfully removes low quality reads. However, given that your reads are metagenomic the issue may be that some reads extend beyond the 16S rRNA gene. Try limiting to those reads that only contain 16S rRNA gene sequence. This will require a little scripting... I suggesting blasting against a 16S rRNA gene database and using the output to trim the reads (or select only those that fully map to the reference).

Ql-cy commented 2 years ago

The default parameters in fastp may simply not be enough, and you may need to try different parameters to find a combination that successfully removes low quality reads. However, given that your reads are metagenomic the issue may be that some reads extend beyond the 16S rRNA gene. Try limiting to those reads that only contain 16S rRNA gene sequence. This will require a little scripting... I suggesting blasting against a 16S rRNA gene database and using the output to trim the reads (or select only those that fully map to the reference).

Hey jeff, Thanks for your so quick reply! I have fixed this issue after removing some low quality reads and they could run successfully. But I found a strange thing. A sample can run paprica-run.sh successfully before QC, but this error came out after qc, while most of my samples with this qc parameters could be worked out.

bowmanjeffs commented 2 years ago

Interesting, no immediate thoughts on why that might be, but glad the reads ran!