bhattlab / MGEfinder

A toolbox for identifying mobile genetic element (MGE) insertions from short-read sequencing data of bacterial isolates.
MIT License
105 stars 16 forks source link

Question about "Error in job make_database" #9

Closed JinxinMonash closed 4 years ago

JinxinMonash commented 4 years ago

Hi all,

I would like to use MGEfinder to detect insertion sequence using the whole workflow. I prepared the dics (00.genome, 00.bam, 00.assembly)and files (sample.refer.bam etc) following the tutorial. However, when I run mgefinder workflow /workdir I get the error below. It would be highly appreciated if anyone could help with it. Thank you very much in advance. Screen Shot 2020-06-23 at 11 46 39 am

Best, Jason

durrantmm commented 4 years ago

Thanks for opening this issue! A couple questions:

1) Did things run successfully on the test dataset? 2) Can you list the final directory structure of the working directory after the error? 3) Can you count the number of lines in the output files and list them for me with the wc -l command?

Based on what I am seeing here, it looks like no termini were detected, which means MGEfinder probably can't help you. This may be because the samples are too similar to the reference.

Sorry about that, hopefully we'll figure out what's wrong, but it looks like there just may not be any insertions. I should account for this in the workflow.

JinxinMonash commented 4 years ago

Thank you very much for the quick response!

  1. Yes, everything goes well with the rest dataset.
  2. Pls kindly find the final directory structure below: (mgefinder) faddi@faddi-Precision-7920-Tower:~/Jinxin/dataSet2/workdir$ ls 00.assembly 00.bam 00.genome 01.mgefinder 02.database config.yml Snakefile
  3. I am not sure if I am correct, There's no 03.results from the run. Does that mean no output file?

I do run mgefinder find xxxx.bam for the same sample successful. I have also attached the output file.

Screen Shot 2020-06-24 at 1 22 29 pm

From this output file, there are several insertions. Please correct me if I was wrong.

Best, Jason

durrantmm commented 4 years ago

Great, thanks for doing that.

Looks like it found some potential insertion termini, but it wasn't able to recover the full sequences from the assembly files or the reference itself, meaning they may be false positives.

Could you run ls ~/Jinxin/dataSet2/workdir/01.mgefinder/*/* and wc -l ~/Jinxin/dataSet2/workdir/01.mgefinder/*/*

And tell me the output?

Thanks.

JinxinMonash commented 4 years ago

Thank you very much. Sure, pls kindly find them as attached.

Screen Shot 2020-06-24 at 1 43 47 pm Screen Shot 2020-06-24 at 1 44 06 pm
durrantmm commented 4 years ago

Great, sorry one more thing, try running:

wc -l ~/Jinxin/dataSet2/workdir/01.mgefinder/*/*/*

Thanks.

JinxinMonash commented 4 years ago

No worries.

Screen Shot 2020-06-24 at 1 49 55 pm

Thank you very much.

durrantmm commented 4 years ago

Yeah, as I suspected, MGEfinder was not able to find any insertions using these two files. It identifies candidate insertions in the *find.*.AB5075UW.tsv files, but then it is not able to determine the full identity of the inserted element, if it exists at all. The insertion flanks identified in the find files could just as easily be inversions, for example, which MGEfinder is not designed to identify in the subsequent steps.

If you have more isolates, I would recommend including those in the working directory as well, because this should increase sensitivity. But from what I can see, it can't detect any large insertions with these isolates.

Sorry about that! Good luck.

JinxinMonash commented 4 years ago

Thank you very much, that helps a lot. Yea, it might be due to the low mutation frequency (~0.1-0.3). I will include more and have another test.

Best, Jinxin