bhattlab / MGEfinder

A toolbox for identifying mobile genetic element (MGE) insertions from short-read sequencing data of bacterial isolates.
MIT License
105 stars 16 forks source link

Can MGEfinder detect MGEs in the presence of strain diversity? #10

Closed ramiroricardo closed 4 years ago

ramiroricardo commented 4 years ago

Hi all,

I was wondering if MGEfinder can be used in samples that are not fully clonal? e.g. have you tested whether MGEfinder can detect MGEs with different levels of "purity" of the sample?

Specifically, my use case would be data from experimental evolution carried in a natural environment, where populations of a given strain are sequenced after a given time of evolution. To be more specific, the design is similar to this, where mice (that contain a microbiota) are colonized with a bacterial clone, which is allowed to evolve within different mice for a specific period. After this period, faecal samples are plated in selective media for the clone of interest and all colonies growing on a plate are scrapped and sequenced, which is where it differs from the standard MGEfinder use case.

Multiple experiments have shown that bacteria readily adapt and we can detect SNPs/ISs, etc. Though, as this environment contains multiple species, HGT is a possibility. So the data contains substantial microvariation, but the genomes constituting these populations are not as different as if we were to sample different clones from different people.

One thing I worry is if the SPADES assembly step will basically lead to a consensus sequence that ignores the microvariation present in the population. Therefore, I was wondering if it is possible to use metaSPADES for the assembly process (assuming this would allow us to keep more of that microvariation)?

PS. sorry for the rambling post

durrantmm commented 4 years ago

Thanks for the the question!

MGEfinder was designed to work with fully clonal samples, and its performance was evaluated under that assumption.

I would not use MGEfinder in a highly diverse metagenomic sample. But what you have is different, and I think that MGEfinder could be helpful. The concerns you have with an assembly removing the low-frequency variation is certainly a valid one.

I think you should give it a try and see what you find. I would carefully curate any putative MGEs that MGEfinder identifies. You may want to then try playing with some of the parameters to make it more sensitive toward detecting low-frequency variants (although this could increase false positives).

I think this is an interesting question and experimental design, and I would be happy to help you further if you run into other issues along the way.