faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
78 stars 49 forks source link

phyluce_probe_slice_sequence_from_genomes error #168

Closed karlyhiggins closed 3 years ago

karlyhiggins commented 5 years ago

I am getting an error when running:

phyluce_probe_slice_sequence_from_genomes \ --conf scyphozoa-genome.conf \ --lastz Aurelia-genome-lastz \ --probes 180 \ --name-pattern "Aurelia+2.temp-DUPE-SCREENED.probesv{}.lastz.clean" \ --output Aurelia-genome-fasta

It starts by reading through all the genomes successfully until it gets to genome "Sand" and calls the error:

2019-09-17 11:28:54,098 - Phyluce - INFO - Reading Sand genome Traceback (most recent call last): File "/Users/dlab/anaconda2/envs/phyluce/bin/phyluce_probe_slice_sequence_from_genomes", line 348, in main() File "/Users/dlab/anaconda2/envs/phyluce/bin/phyluce_probe_slice_sequence_from_genomes", line 328, in main ss, se, sequence = slice_and_return_fasta(tb, contig_name, min, max, args.flank, args.probes) File "/Users/dlab/anaconda2/envs/phyluce/bin/phyluce_probe_slice_sequence_from_genomes", line 160, in slice_and_return_fasta return ss, se, tb[name][ss:se] File "/Users/dlab/anaconda2/envs/phyluce/lib/python2.7/site-packages/bx/seq/twobit.py", line 82, in getitem seq = self.index[name] KeyError: 'scaffold394520'

I tried removing this genome to see what would happen but it calls this error on different scaffolds for 4 of the genomes in my analysis.

Here is a section of Sand.fasta that includes scaffolds before and after 'scaffold394520'

scaffold394518|size103 TTATCAACAATGGGAAGTCATCAAAAGTGGGAAGTCATCAAAAGTGGGAATTTATCAAAA GTGAGAAGTCAGCAATAGTGGAAATTCACCAAAAATGGGAGGA scaffold394519|size103 CAAGTGATGGAGAGACAGTTTCCGTCGCTGGGATGAAATGGTTCCCGGAATCGGACCATA TCATTATTGACAACCCGGAAATCAACTTCGCCGAGAAATACCG scaffold394520|size103 ATCAAATAAGACAGAGATTGTGAAATTCCTTGTATCTGAATGGAAAAAGCCAGAGTTCAT TGCCAAACTGGAAGGGAAGACAATGTATGTGACAGAAGGAAGC scaffold394521|size103 CAAGCAAATGCATGTGTCTTTTACTTAGTTTCAGCTGTACTAATGTGAATTCATGTGCAA TTATTGTGAATGAGTGTTAATATCACTGAAAACAAGCAAATGC scaffold394522|size103 ACATTCAGAAAAATGAAATAAAATAAAAGATTATCATTTCTCAATAAGTTTTCTTTGAAA TTACCGCTGCTTTTTGTTAGATTCTATGACATCTTTGAACAAT scaffold394523|size103

brantfaircloth commented 5 years ago

Its sounds as if other fastas (formatted as twobit files(?)) you are using work, while 4 of them do not? that suggests something wrong in either the formatting or composition of the 4 fastas that have problems. I'm not sure what the exact error is, but you might look for things like duplicate contig names or formatting errors within the fasta before you convert it to twobit.