bioperl / bioperl-live-redmine

Legacy tickets migrated from the OBF Redmine issue tracker: http://redmine.open-bio.org
0 stars 0 forks source link

AlignIO MSF writing #124

Open cjfields opened 8 years ago

cjfields commented 8 years ago

Author Name: Bernd W (Bernd W) Original Redmine Issue: 3299, https://redmine.open-bio.org/issues/3299 Original Date: 2011-10-04 Original Assignee: Bioperl Guts


Hi

Bioperl describes the MSF format: http://www.bioperl.org/wiki/MSF\_multiple\_alignment\_format I found that writing MSF (version 1.5.2 to 1.61 and bioperl-live is different from this description. I am not sure what is all allowed in MSF, but the header is written like this: $self->_print (sprintf(“\n%s MSF: %d Type: %s %s Check: 00 ..\n\n”, $name, $aln->num_sequences, $type, $date));

This means that we get lines like this: NoName MSF: 3 Type: P Tue Oct 4 21:34:51 2011 Check: 00 ..

However, after MSF there should be the length, as in the Len: field. In other descriptions I do find the alignment name preceding the MSF: and sometimes also column numbering as is written by BioPerl:

1 50 CYS1_DICDI/1-343 ——MKVIL LFVLAVFTVF VSS——— ————RG IPPEEQ—— ALEU_HORVU/1-362 MAHARVLLLA LAVLATAAVA VASSSSFADS NPIRPVTDRA ASTLESAVLG CATH_HUMAN/1-335 ———MWAT LPLLCAGAWL LGV———- -PVCGAAELS VNSLEK——

However, most format description i see do not have numbers and alignment names. See e.g. http://www.ebi.ac.uk/help/formats.html\#MSF but also http://tcoffee.vital-it.ch/Doc/doc3.html and http://bmerc-www.bu.edu/examples/output/seqlist.msf.html

In any case it seems that $aln->num_sequences, is wrong and should be $aln->length