bhattlab / MGEfinder

A toolbox for identifying mobile genetic element (MGE) insertions from short-read sequencing data of bacterial isolates.
MIT License
105 stars 16 forks source link

Error in pair step #35

Closed erinpnewcomer closed 2 years ago

erinpnewcomer commented 2 years ago

Hi! I'm running into an error on the pair step while using MGEfinder v1.0.6. I'm using the workflow denovo command, and the working directory only produces the 01.mgefinder directory. Within the ~/01.mgefinder///log/..pair.log.err file, I get this error:

Traceback (most recent call last): File "/home/erin.newcomer/.conda/envs/mgefinder/bin/mgefinder", line 8, in sys.exit(cli()) File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 764, in call return self.main(args, kwargs) File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 555, in invoke return callback(args, **kwargs) File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/main.py", line 194, in pair min_alignment_inner_length, max_junction_spanning_prop, large_insertion_cutoff, output_file) File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/pair.py", line 40, in _pair flank_pairs = flank_pairer.run_pair_flanks() File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/pair.py", line 113, in run_pair_flanks final_pairs = self.get_direct_repeats(filtered_pairs) File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/pair.py", line 278, in get_direct_repeats positions = self.get_reference_direct_repeats(flank_pairs, genome_dict) File "/home/erin.newcomer/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/pair.py", line 292, in get_reference_direct_repeats direct_repeat = genome_dict[contig][(start+1):end] KeyError: '1'

Do you have any advice?

durrantmm commented 2 years ago

Ok, this looks like a problem with the way the contigs are named in the reference genome. Sorry, an annoying error. You can either try renaming the contigs, or you can share the genome with me and I can try to fix the problem.

erinpnewcomer commented 2 years ago

No worries! I saw an earlier error that had an issue if the contigs were only named with an integer (mine were). I tried lazily by concatenating the length and depth info on, so the names now look like '>18length=125depth=1.68x' and it still has this issue. Would naming them something simple like 'contig1' likely fix this?

durrantmm commented 2 years ago

Yes it probably would. It's dumb I know, sorry!

erinpnewcomer commented 2 years ago

I re-aligned everything and ran it again, and it worked this time! Thank you so much!