WGLab / RepeatHMM

a hidden Markov model to infer simple repeats from genome sequences
Other
34 stars 14 forks source link

Error "gap opening penalty should be higher than gap extension penalty (or equal)" #1

Closed c-zhou closed 7 years ago

c-zhou commented 7 years ago

Hi there,

When we run the RepeatHMM we run into the error as the title suggested (the master branch). We believe the error was caused by the line 108 of the RepeatHMM/bin/scripts/myBAMhandler.py

_match, mismatch, gap, gapextension = 2, -1, -1, -10 #from Bio import pairwise2;

The _gapextension (-10) penalty here is smaller then the gap opening (-1).

We changed the line to,

_match, mismatch, gap, gapextension = 2, -1, -10, -1 #from Bio import pairwise2;

Does it make sense? Or you have better suggestions for the penalties?

Thanks.

liuqianhn commented 7 years ago

Hi Zhou, the error would be caused by pairwise2 for pairwise alignment. RepeatHMM itself does not have the constraint on gap opening penalty and gap extension penalty.

In RepeatHMM, I recommend that the gap_extension penalty has smaller value than the gap opening if possible. If some other packages require that "gap opening penalty should be equal to or higher than gap extension penalty", I recommend -1 for both gap opening penalty and the gap extension penalty. I do not recommend that gap opening penalty (=-10) has much smaller value than gap extension penalty (-1).

c-zhou commented 7 years ago

Hi liuqian, thanks very much for your prompt response.

Yes you are right, it is caused by Biopython pairwise2. What should I do if I want to use the default parameters from the RepeatHMM, say gap=-1 and gap_extension=-10? Should I avoid the pairwise2 step but how? Or should I use a different version of Biopython?

liuqianhn commented 7 years ago

Hi Zhou, glad to receive your reply. Instead of avoiding this step, I prefer to suggest that you try different version of pairwise2 if it is easy to do so, OR you can just use -1 for both gap opening penalty and gap extension penalty.

This step here would provide additional better flanking region estimation in HMM; thus, it would be better to have it with -1 for both gap penalties than to avoid the step, when the default setting (-1 for gap opening penalty and -10 for gap extension penalty) does not work.

c-zhou commented 7 years ago

Cool. Thanks liuqian.