Open ChristyPeterson opened 4 years ago
Hi Christy,
Is it possible one of the sequences isn't in valid fasta format? It looks similar to errors of that type. If not, could you send me the config file and link the public genomes that cause the error?
Thanks, Chad
The file listed as being problematic looks like a valid fasta to me. Also, this file went through the first run successfully.
>NZ_CP016054.1 Treponema pallidum subsp. pallidum strain PT_SIF1127 genome
TAGATGGACGCAGTAGGGTATGAAGTATTCTGGAACGAGACACTCAGCCAGATACGGAGTGAATCGACCGAAGCAGAATT
TAACATGTGGTTTGCTCATTTGTTCTTTATCGCATCTTTTGAAAACGCTATCGAAATAGCAGTACCTTCAGACTTTTTCC
GAATACAGTTTAGCCAAAAATATCAAGAAAAGCTTGAGCGCAAGTTCCTCGAACTTTCTGGACACCCCATTAAACTTTTG
TTTGCCGTTAAAAAAGGCACCCCTCATGGAAATACTGCTCCCCCCAAACACGTGCATACCTACCTGGAGAAAAACTCTCC
TGCAGAGGTTCCTTCCAAAAAGAGCTTTCACCCCGACCTGAACAGAGACTATACCTTCGAGAACTTTGTATCCGGAGAAG
AAACCAAATTCAGCCATAGCGCTGCTATCTCCGTATCAAAAAACCCAGGCACTTCCTACAATCCGTTACTTATCTACGGT
GGAGTGGGACTAGGAAAAACCCACCTTATGCAGGCTATTGGACACGAGATCTACAAGACAACAGACCTGAACGTCATATA
CGTCACTGCGGAGAATTTTGGAAATGAATTCATTTCCACATTACTCAATAAAAAGACCCAGGATTTTAAAAAAAAATACC
GCTACACCGCGGATGTACTTCTTATAGATGACATTCATTTTTTTGAAAACAAAGACGGATTACAAGAAGAGCTTTTCTAT
ACGTTCAACGAACTTTTCGAGAAAAAAAAACAAATTATCTTTACCTGCGACAGGCCTGTACAAGAATTGAAAAATCTCTC
TTCTCGCTTACGCTCGAGGTGCTCCCGAGGGCTTAGCACTGATCTGAATATGCCATGTTTTGAAACGCGCTGTGCTATCT
I did check using grep for any weird characters and nothing pops up outside of the header.
I've attached two lists:
acc-list-full.txt
is the full list of accessions used for this run.acc-list-add.txt
are the accessions that were added to run1 (completed successfully) to make up this run (full list).acc-list-add.txt acc-list-full.txt
I looked through all the fasta in the 'add' txt file, and none of those have any weird characters in the sequence.
For the config file, do you mean the settings file?
In case you meant the settings file to run panseq, I've attached it below, though altered the pathways to where stuff is located
queryDirectory PATH/ncbi_assemblies/ncbi-genomes-2019-12-06/
baseDirectory PATH/panseq/run2-all-strains
numberOfCores 20
mummerDirectory /PATH/bin/
blastDirectory /PATH/bin/
minimumNovelRegionSize 500
novelRegionFinderMode no_duplicates
muscleExecutable /PATH/bin/muscle
fragmentationSize 500
percentIdentityCutoff 85
coreGenomeThreshold 2
runMode pan
Perfect, I will take a look.
Hi Chad,
I'm trying to run panseq on some publically available genomes, and was successful when running the genomes from a subspecies. As soon as I included two other subspecies, I get an "unexpected char in string" error. Weirdly, this error is coming up in strains that were successful in the first run. Those characters do not exist in the input so I'm assuming its in a temp file the program is writing and then referring back to?
Below is an example from the Master log file (the top and bottom).
If I remove the isolate from the analysis I get even more of these errors, for several other isolates. Any insight would be awesome.
Thanks! -Christy