bowmanjeffs / paprica

paprica - PAthway PRediction by phylogenetIC plAcement
27 stars 8 forks source link

HMM banded truncated alignment #50

Closed lucaz88 closed 5 years ago

lucaz88 commented 7 years ago

Hi, running paprica I am getting the following error:

Error: HMM banded truncated alignment mxes need 5752.95 Mb > 1028.00 Mb limit. Use --mxsize, --maxtau or --tau. Fatal exception (source file esl_buffer.c, line 1599): zero malloc disallowed

I am using paprica since a while without too many troubles but I cannot figure out how to solve this problem with the set of sequences that I am currently analyzing. In input I have a file of unique representative sequences of ~5000 bacterial 16S (341F-785R). In paprica-place_it.py I saw that it can be due to reference sequences that don't fit the covariance model but I am just using the provided reference db. Can you help me?

Thanks, Luca

bowmanjeffs commented 7 years ago

I've occasionally encountered this if there are query reads that don't fit the reference model (I need to change the comment you referenced above). This can be due to a handful of low quality reads, or reads that don't actually belong to the domain of the cm. I suggest 1) checking read quality 2) checking that each read can be classified by the cm using infernal's cmsearch and 3) using the --mxsize flag referenced above. For that simply modify the relevant line in paprica-place_it.py. Let me know if it isn't clear which line that is. Only do #3 as a last resort, as cmalign should work with the default memory setting.

rodosaurio commented 5 years ago

Hi, I actually have the same error whit eggnog-mapper: Sequence mapping starts now! Fatal exception (source file esl_buffer.c, line 1599): zero malloc disallowed Aborted (core dumped) Processed queries:1 total_time:1.05260109901 rate:0.95 q/s Skipping seed ortholog detection in "viruses" database Functional annotation of hits starts now Processed queries:0 total_time:4.60147857666e-05 rate:0.00 q/s

could you solve the problem? thanks for any information about this.

Regards

bowmanjeffs commented 5 years ago

Did you mean to post this here? Doesn't look like a paprica issue... (does, however, look like a memory problem so just make sure you have enough for what you're trying to do)

rodosaurio commented 5 years ago

Im sorry, I was wrong of thread. (the problem with my error was that I was using a file with nucleotides, instead of proteins) Regards

tomazr commented 5 years ago

I've occasionally encountered this if there are query reads that don't fit the reference model (I need to change the comment you referenced above). This can be due to a handful of low quality reads, or reads that don't actually belong to the domain of the cm. I suggest 1) checking read quality 2) checking that each read can be classified by the cm using infernal's cmsearch and 3) using the --mxsize flag referenced above. For that simply modify the relevant line in paprica-place_it.py. Let me know if it isn't clear which line that is. Only do #3 as a last resort, as cmalign should work with the default memory setting.

Had this issue for the first time for one sample of 10000 reads. Using paprica 0.4.1b. Changed the flag --mxsize in papica-place_it.py from 4000 to 1028 and added -large to paprica-run.sh Error still there, also with a smaller subsample. Hard to locate the exact sequence(s) causing the problem, so I rather fix the --mxsize limit.

Thanks, Tomaz

bowmanjeffs commented 5 years ago

We've made a number of improvements to paprica since 0.4.1b that should help eliminate. The real problem is still the same; insufficient QC or other issue with some read or reads preventing if from fitting the alignment model. paprica now has a -large option to raise memory limits, and automatically makes the input fasta redundant, so there is less need for memory-inefficient parallelization. I suggest seeing if the problem persists with the most recent release. Let me know if there is still an issue (check also your QC and make sure that file looks good).