NBISweden / MrBayes

MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. For documentation and downloading the program, please see the home page:
http://NBISweden.github.io/MrBayes/
GNU General Public License v3.0
234 stars 79 forks source link

Error while parsing a string. Token "tctttgatctacctggcaac...[followed by at least 99970 more charectors]" is too long. Maximum allowed length of a token is 99990 #292

Closed 51mystic closed 10 months ago

51mystic commented 1 year ago

Dear teacher, when I execute the code, the ### sequence is too long, resulting in ### code interruption, I refer to the error guidance, add ### interleave information, after saving, still can not run. I would like to ask you how to solve this problem. Here's what my code looks like: begin data; dimensions ntax=7 nchar=5823716; format datatype=dna missing=? gap=- interleave; matrix LD10A catgatgaaggaaattttggatattaacggggatttttttgg...... ... LD7A cgtattgaatacaacttttt---ttgttaacggggatttttttgg...... ; end; ... begin mrbayes; set autoclose=yes nowarn=yes; lset nst=6 rates=invgamma; prset statefreqpr=fixed(equal); outgroup LD10A mcmc ngen=1000000 printfreq=1000 samplefreq=100 samplefile=/mnt/data/userdata/svip019/00----outcome/mrbayes-o/myout.nex; sumt burnin=250; end; Looking forward to your answer.

51mystic commented 1 year ago

Added code execution error information: Executing file "matrix-80-align-concatenate-nexus.nexus" UNIX line termination Longest line length = 5823723 Parsing file Expecting NEXUS formatted file Reading data block Allocated taxon set Allocated matrix Defining new matrix with 7 taxa and 5823716 characters Data is Dna Missing data coded as ? Gaps coded as - Taxon 1 -> LD10A Error while parsing a string. Token "tctttgatctacctggcaac...[followed by at least 99970 more charectors]" is too long. Maximum allowed length of a token is 99990 The error occurred when reading char. 662483-662502 on line 6 in the file 'matrix-80-align-concatenate-nexus.nexus'

Returning execution to command line ...

Error in command "Execute" Will exit with signal 1 (error) because quitonerror is set to yes If you want control to be returned to the command line on error, use 'mb -i ' (i is for interactive) or use 'set quitonerror=no'

51mystic commented 1 year ago

Excuse me, teacher, how to get the interleaved sequence?

nylander commented 10 months ago

Most probably an issue with the input format: long lines (5.8 M characters). There are many ways to convert modern sequence formats, using web sites or programming languages. Below are two examples using the python scripting language (tested using Python 3.10.12 with Biopython 1.79):

Convert DNA alignment in fasta to interleaved nexus

from Bio import SeqIO
SeqIO.convert("infile.fas", "fasta", "outfile.nex", "nexus", "DNA")

Convert DNA alignment in in non-interleaved nexus to interleaved nexus

from Bio import SeqIO
SeqIO.convert("infile.nex", "nexus", "outfile2.nex", "nexus", "DNA")
nylander commented 10 months ago

Closing with comment above