Given a reference, PhaME extracts SNPs from complete genomes, draft genomes and/or reads. Uses SNP multiple sequence alignment to construct a phylogenetic tree. Provides evolutionary analyses (genes under positive selection) using CDS SNPs.
GNU General Public License v3.0
31
stars
15
forks
source link
Pal2nal translation of large multi-fasta files produces a codon translated file where some of the sequences are half length of the average #23
Not sure if you are maintaining pal2nal.pl. Apologies for bothering you, your repo is the first to show up on google when searching for pal2nal.
I did sequence alignment of a large peptide multi-fasta (n= 4991 sequences). The peptide alignment has sequences with the same length and pal2nal went through just fine... except some of the codon sequences are at half length. If average is X then some sequences are X/2. This is choking IQ-Tree.
I have tried both MUSCLE super5 and MAFFT. The error remain the same (i.e. MUSCLE or MAFFT both lead to some sequences having half of average length) except for different average lengths and average half length in MUSCLE and MAFFT codon sequences. I have pulled out and played with the sequences causing the issue and they seem to be in frame.
Example of peptide sequence not causing an issue: RKVEAFLLFKEMGERGCQPNVHTYTVLIDSFCKERNLDDARKLFDDMFKKGLVPSVVTYNALIDGYCKEGMTEAALEILGMMESKKCNPNARTYNELICGFCKAK
Hello,
Not sure if you are maintaining
pal2nal.pl
. Apologies for bothering you, your repo is the first to show up on google when searching for pal2nal.I did sequence alignment of a large peptide multi-fasta (n= 4991 sequences). The peptide alignment has sequences with the same length and pal2nal went through just fine... except some of the codon sequences are at half length. If average is X then some sequences are X/2. This is choking IQ-Tree.
I have tried both MUSCLE super5 and MAFFT. The error remain the same (i.e. MUSCLE or MAFFT both lead to some sequences having half of average length) except for different average lengths and average half length in MUSCLE and MAFFT codon sequences. I have pulled out and played with the sequences causing the issue and they seem to be in frame.
Example of peptide sequence not causing an issue:
RKVEAFLLFKEMGERGCQPNVHTYTVLIDSFCKERNLDDARKLFDDMFKKGLVPSVVTYNALIDGYCKEGMTEAALEILGMMESKKCNPNARTYNELICGFCKAK
corresponding cds
AGGAAAGTGGAAGCTTTTCTACTTTTTAAAGAAATGGGTGAAAGAGGTTGTCAGCCTAATGTTCATACATACACTGTGCTTATTGATTCCTTCTGTAAGGAAAGGAATCTTGATGATGCCAGGAAATTGTTTGATGACATGTTTAAGAAAGGTTTGGTTCCCAGTGTGGTCACTTATAATGCTTTAATTGATGGGTATTGTAAAGAGGGAATGACTGAAGCTGCATTAGAAATTTTAGGTATGATGGAATCAAAGAAATGCAACCCTAATGCTCGGACCTACAATGAATTGATCTGTGGATTTTGTAAAGCTAAA
'Example of peptide causing issue:
GLCKGGRLNDAWEIFQYLLAKGYQLNVHTYNAMVHGFCKEGLLDEAISLLYKMEENGCVPNSVTFNVVL
corresponding cds
GGTTTGTGCAAAGGTGGTAGATTAAATGATGCGTGGGAGATTTTTCAGTATCTTTTAGCGAAAGGTTATCAACTAAATGTCCATACATATAATGCGATGGTTCATGGTTTTTGCAAAGAAGGTTTGCTTGATGAAGCAATCTCCCTGCTTTATAAAATGGAAGAGAATGGTTGTGTCCCTAATTCTGTAACTTTTAATGTAGTCCTT
Any idea what might be going on?
Happy to post the sequence alignment files and the cds files, they are bit large if you would like to follow up on this.