bhattlab / MGEfinder

A toolbox for identifying mobile genetic element (MGE) insertions from short-read sequencing data of bacterial isolates.
MIT License
105 stars 16 forks source link

Sensitivity to detect IS elements #14

Closed Bowmore12 closed 4 years ago

Bowmore12 commented 4 years ago

Dear durrantmm

I am using MGEfinder to detect IS6110 of Mycobacterium tuberculosis, which has been long used for DNA fingerprinting of M.tb isolates. I finished analysis of MGEfinder for hundreds of M.tb isolates, and now I am encountering some difficulties to interpret results.

  1. Though nearly all of M.tb strains should harbor IS6110, MGEfinder detects no IS6110 insertion among 5% of my tested strains. All of them belong to lineage 1 and 4.

  2. I noticed that MGEfinder often detects a smaller number of IS6110 than that by another reliable IS-finding tool, implying MGEfinder lacks sensitivity in my condition.

  3. Related to 1 and 2, let me confirm one point. M.tb has hot-spot regions where IS6110 are inserted at identical positions frequently. If IS6110 insertion points were shared between reference genome sequence and query strains, can MGEfinder detect those shared IS6110 ? 

I know you are analyzing M.tb data in your published paper, and I am grateful if you could give us any suggestions to overcome these difficulties. For example, are there any recommended parameters to improve sensitivity? The size of IS6110 is about 1300 bp and I want to focus on IS6110 insertions in this time.

I always appreciate your kind support. Many thanks.

durrantmm commented 4 years ago

Hi,

  1. MGEfinder won't detect IS insertions that already exist in the reference, and it won't detect insertions in repetitive regions that can't be reliably mapped. It's also possible that these 5% of isolates don't have any additional insertions that don't exist in the reference.

  2. What is the other tool you used? How much lower is the sensitivity? MGEfinder is a de novo insertion detector by default, other tools require you to provide the sequence of interest beforehand. If you know what you are looking for already, we don't even need assemblies, for example. I would be happy to work with you on this, but I'd have to see more of your data.

  3. Detecting those insertions that are shared between reference and query is not something that it does by default.

We can handle this via email, send me your response directly to mdurrant@stanford.edu.