bioperl / bioperl-live-redmine

Legacy tickets migrated from the OBF Redmine issue tracker: http://redmine.open-bio.org
0 stars 0 forks source link

codeml sometimes fails to parse if sequence name is too long #134

Open cjfields opened 8 years ago

cjfields commented 8 years ago

Author Name: Janet Young (Janet Young) Original Redmine Issue: 3332, https://redmine.open-bio.org/issues/3332 Original Date: 2012-03-06 Original Assignee: Bioperl Guts


Hi there,

I’ve been running pairwise codeml on a whole bunch of alignments, and am occasionally running into examples that fail to parse. Here’s one where I’ve tracked down why - a sequence name that is too long can cause trouble. I can see why, I think - the portion of the mlc file that describes the Nei-Gojobori matrix loses an important space if the sequence name is too long.

A good N-G matrix looks like this:


Nei & Gojobori 1986. dN/dS (dN, dS) (Note: This matrix is not used in later ML. analysis. Use runmode = –2 for ML pairwise comparison.)

seq1a aaaaaaaaaaaaaaaaaaa –1.0000 (0.0000 0.0000) ————-

and one that doesn’t parse looks like this (no space after the second sequence name ————- Nei & Gojobori 1986. dN/dS (dN, dS) (Note: This matrix is not used in later ML. analysis. Use runmode = –2 for ML pairwise comparison.)

seq1a aaaaaaaaaaaaaaaaaaaa-1.0000 (0.0000 0.0000) ————-

I think it might only be a problem for a minority of comparisons where there is no sequence divergence, but I’m not sure. I’ll attach a test script that should demonstrate the problem clearly, but please let me know if more explanation would be helpful.

thanks very much,

Janet Young


Dr. Janet Young

Tapscott and Malik labs

Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., C3-168, P.O. Box 19024, Seattle, WA 98109-1024, USA.

email: jayoung …at… fhcrc.org


cjfields commented 8 years ago

Original Redmine Comment Author Name: Daisie Huang Original Date: 2012-06-20T22:14:49Z


I submitted a fix for this bug: https://github.com/daisieh/bioperl-live/zipball/bug3332