McTavishLab / physcraper

Welcome to Physcraper’s repository! Automatic gene tree updating using the Open Tree of Life.
https://physcraper.readthedocs.io/en/main/
GNU General Public License v3.0
13 stars 6 forks source link

muscle error on profile alignment #98

Open LunaSare opened 4 years ago

LunaSare commented 4 years ago

Running physcraper_run.py -s pg_2827 -t tree6577 -a data-raw/alignments/T1281-M2478.nex -as nexus -o data/pg_2827_tree6577

It is having trouble reading in the original alignment

MUSCLE v3.8.1551 by Robert C. Edgar

http://www.drive5.com/muscle
This software is donated to the public domain.
Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.

00:00:00      1 MB(0%)  Reading new_seqs_aligned_2020-06-02_T1281-M2478.fas
00:00:00      1 MB(0%)  140 seqs 849 cols
00:00:00      1 MB(0%)  Reading original_T1281-M2478.fas

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** WARNING *** Invalid character '?' in FASTA sequence data, ignored

*** ERROR ***  Internal error MSA::ExpandCache, ColCount changed
LunaSare commented 4 years ago

The problem is that muscle is not only ignoring but removing all ? from the alignment, which sometimes results in different number of columns per sequence. Doing sed 's/\?/-/g' original_T1281-M2478.fas > modified_original.fas will replace all ? by -