GSLBiotech / mafft

Align multiple amino acid or nucleotide sequences.
Other
45 stars 7 forks source link

Mafft trimming species names #3

Open sevance opened 3 years ago

sevance commented 3 years ago

Hello!

MAFFT has been super helpful in my phylogenomics work - thanks for such a great tool!

I am having issues with MAFFT automatically trimming all my branch names. I am running an MSA on multiple reference genomes as well as my own MAGs each of these have names >10 characters in the format "Genus species". In the MAFFT output they are all trimmed down to 10 characters and then when I build the tree I can't tell the differences between species in the same genus. MAFFT output looks like this:

Cyanobacte NQVVYLGTGR RKASVARVR- ---------- ---LVP-GTG AVKVNNREGS Cyanobacte DKVVYLGTGR RKASIARVR- ---------- ---LVP-GSG AVTVNGKDAV Cyanobacte QRAVYWGTGR RKTAVARVR- ---------- ---LVP-GTG KLIINDRPGD Cyanobacte QRAVYWGTGR RKTAVARVR- ---------- ---LVP-GTG KIIINDRPGD

Am I missing some argument to pass in to avoid this trimming? Thanks for your help!

kakatoh commented 2 years ago

To use longer name in the clustal format, try

mafft --clustalout --namelength 50 x > y

Also note that only the first word is used as sequence name in the --clustalout option. To use two or more words, please replace space ' ' with '_' or another character in sequence name.