LANL-Bioinformatics / PhaME

Given a reference, PhaME extracts SNPs from complete genomes, draft genomes and/or reads. Uses SNP multiple sequence alignment to construct a phylogenetic tree. Provides evolutionary analyses (genes under positive selection) using CDS SNPs.
GNU General Public License v3.0
31 stars 15 forks source link

Pal2nal translation of large multi-fasta files produces a codon translated file where some of the sequences are half length of the average #23

Closed VectorFrankenstein closed 9 months ago

VectorFrankenstein commented 1 year ago

Hello,

Not sure if you are maintaining pal2nal.pl. Apologies for bothering you, your repo is the first to show up on google when searching for pal2nal.

I did sequence alignment of a large peptide multi-fasta (n= 4991 sequences). The peptide alignment has sequences with the same length and pal2nal went through just fine... except some of the codon sequences are at half length. If average is X then some sequences are X/2. This is choking IQ-Tree.

I have tried both MUSCLE super5 and MAFFT. The error remain the same (i.e. MUSCLE or MAFFT both lead to some sequences having half of average length) except for different average lengths and average half length in MUSCLE and MAFFT codon sequences. I have pulled out and played with the sequences causing the issue and they seem to be in frame.

Example of peptide sequence not causing an issue: RKVEAFLLFKEMGERGCQPNVHTYTVLIDSFCKERNLDDARKLFDDMFKKGLVPSVVTYNALIDGYCKEGMTEAALEILGMMESKKCNPNARTYNELICGFCKAK

corresponding cds AGGAAAGTGGAAGCTTTTCTACTTTTTAAAGAAATGGGTGAAAGAGGTTGTCAGCCTAATGTTCATACATACACTGTGCTTATTGATTCCTTCTGTAAGGAAAGGAATCTTGATGATGCCAGGAAATTGTTTGATGACATGTTTAAGAAAGGTTTGGTTCCCAGTGTGGTCACTTATAATGCTTTAATTGATGGGTATTGTAAAGAGGGAATGACTGAAGCTGCATTAGAAATTTTAGGTATGATGGAATCAAAGAAATGCAACCCTAATGCTCGGACCTACAATGAATTGATCTGTGGATTTTGTAAAGCTAAA'

Example of peptide causing issue:

GLCKGGRLNDAWEIFQYLLAKGYQLNVHTYNAMVHGFCKEGLLDEAISLLYKMEENGCVPNSVTFNVVL

corresponding cds GGTTTGTGCAAAGGTGGTAGATTAAATGATGCGTGGGAGATTTTTCAGTATCTTTTAGCGAAAGGTTATCAACTAAATGTCCATACATATAATGCGATGGTTCATGGTTTTTGCAAAGAAGGTTTGCTTGATGAAGCAATCTCCCTGCTTTATAAAATGGAAGAGAATGGTTGTGTCCCTAATTCTGTAACTTTTAATGTAGTCCTT

Any idea what might be going on?

Happy to post the sequence alignment files and the cds files, they are bit large if you would like to follow up on this.

mshakya commented 9 months ago

we are not maintaining pal2nal and not sure if anyone is. There are more upto date codon aligner. I would recommend you try those instead.