Closed Bowmore12 closed 4 years ago
Thanks for this, it looks like a very similar issue to #9. My guess is that MGEfinder couldn't identify any potential MGE insertions. To double check this, could you run the command:
wc -l yourworkdir/01.mgefinder/*inferseq*.tsv
Thanks!
This is entirely possible given that you only have 1 isolate. Sensitivity increases as more isolates are included.
As another check, I would like to know what you see when running:
wc -l yourworkdir/01.mgefinder/*
Hi, durrantmm. Thank you for your prompt support!
Yes, your guess was correct. I successfully completed the workflow when I increase the number of isolates from 1 to 9.
I am very happy to use MGEfinder for my research, and I have additional questions. (please feel free to move these topics to new threads if needed).
Regarding the sensitivity (and the specificity) you mentioned, could you tell me how many isolates should be included for the workflow analysis ?
How can we know the genotyping results is correct or not? For example, if I can use reference strain fastq for the analysis, I can see whether MGEs are successfully detected or not by looking annotated information. But I do not know using the reference stain data means same or not for the case of clinical isolates because the pipeline do the analysis based on the reference genome.
For the spades.py analysis indicated, do you recommend to use --isolates and/or --careful options ? In my case, spades always suggest to use --isolates option.
To speed up the workflow analysis, I am thinking to use job scheduler (UGE). I just wonder we are very happy if the template commands are supplied. I have 1000 clinical isolates. Batch mode may be another alternative.
Many thanks.
Glad to hear that you find it useful.
Hard to say exactly, and it depends heavily on your collection. In simulations, we found that having 10 isolates dramatically increased sensitivity, so I would say 9 isolates is a good amount.
Interesting question. You should think of MGEfinder like a SNP caller for large insertions. If you aligned a reference strain fastq to the reference genome, you would (hopefully) not detect any SNPs, because they are the same isolate. This is also true for MGEfinder, you wouldn't detect any MGE insertions. If you want to detect insertions in the reference strain, you will need to use the reference strain as a query, with an alignment to some other genome in the the 00.bam
directory.
There are several ways to increase confidence that the genotypes you are seeing are real. You can limit the analysis to only the MGEs that you trust to be real. For example, MGEs that contain a transposase, or MGEs that are annotated as phage. Just using those MGEs could help increase the quality of the genotyping.
Yes, using the --isolate
in spades would be appropriate here. Looking at their documentation "This flag is highly recommended for high-coverage isolate and multi-cell data; improves the assembly quality and running time. Not compatible with --only-error-correction or --careful options." So if you have a well-covered isolate, that flag would be appropriate.
Using it with a job scheduler definitely makes sense, I have done this myself. It would be quite a bit of work for me to get that up and running for you. If you want my help doing that, maybe we could discuss a more formal collaboration.
I would recommend reading my paper if you haven't already, and/or watching my presentation. Then feel free to contact the corresponding author on the manuscript if you want to pursue a formal collaboration.
Hi!
I was getting the following errors when I run mgefinder workflow using my data sets. How can I resolve this issue?
I confirmed both generating the test dataset and analyzing them using mgefinder workflow work fine.
Thank you in advance.
Yosm