bhattlab / MGEfinder

A toolbox for identifying mobile genetic element (MGE) insertions from short-read sequencing data of bacterial isolates.
MIT License
105 stars 16 forks source link

Can MGEfinder be used for metagenomics data of infant gut? #32

Closed xuanji2017 closed 2 years ago

xuanji2017 commented 2 years ago

Hi, As the title is shown, can MGEfinder be used for metagenomics data of infant gut microbiome? What problem do you expect if I use it? Thanks a lot!

durrantmm commented 2 years ago

It was designed for isolate genomes, but I believe it could certainly be adapted for metagenomes! I have tried doing this a bit myself. You'd want to create a good set of reference genomes that represent abundant members of your metagenomic communities, and then it should work pretty well. You may have to alter some settings that assume haploid genomes. I can help you if you get stuck.

xuanji2017 commented 2 years ago

Thank you so much for the quick reply! I have chosen some reference bacterial genome (Facelibacteria)to test whether it can work. It seems that around 5000 insertion sequences were detected in the whole cohort. here you mentioned "You may have to alter some settings that assume haploid genomes. ", could you elaborate on it? Thanks a lot

durrantmm commented 2 years ago

That's great to hear! You should definitely try annotating those sequences some more before making your final conclusions. MGEfinder should work great if a given insertion occurs at high frequency. But if it occurs at lower frequency, you may want to change some settings to account for that. You might want to try lowering the default parameter --min_softclip_ratio​ from 0.15 to something like 0 or 0.01, you may want to lower --min_count_consensus​ to 1, and you may want to increase --max_junction_spanning_prop​ to something like 0.95.

I am not sure how these changes would affect your results, and they may increase false positives. But a higher false positive rate isn't an issue if you do a good job of annotating the elements themselves, and only keeping the ones that look real to you.

Best, Matt Durrant


From: Xuanji Li @.> Sent: Monday, October 11, 2021 12:54 PM To: bhattlab/MGEfinder @.> Cc: Matt Durrant @.>; Comment @.> Subject: Re: [bhattlab/MGEfinder] Can MGEfinder be used for metagenomics data of infant gut? (#32)

Thank you so much for the quick reply! I have chosen some reference bacterial genome (Facelibacteria)to test whether it can work. It seems that around 5000 insertion sequences were detected in the whole cohort. here you mentioned "You may have to alter some settings that assume haploid genomes. ", could you elaborate on it? Thanks a lot

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bhattlab/MGEfinder/issues/32#issuecomment-940397958, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACNBN4472TYAXF4T6W6NNLLUGM6FXANCNFSM5FYGFQBA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

xuanji2017 commented 2 years ago

Hi Matt Thank you for the reply! It should be a good idea to lower down the detection threshold and manually check the annotated elements. I will try to see what the result looks like afterward. I am annotating the elements now and detected some intact prophage which probably shows that the results are reliable in some certain content. For metagenomics detection, I am worried that there are some insertion elements that do not belong to the reference genomes. For example, we detect one insertion termini using Facelibacteria as a reference genome and then assign consensus flanks against the assembly of the whole community (for example, including E. coli). There are possibly some insertion sequences from E. coli or other bacteria other than Faceli (for example). I feel in principle, this case should occur. Do you have idea how to avoid it?

durrantmm commented 2 years ago

Hi Xuanji, I don't think you'll have a big problem with incorrectly assigning insertion sequences to a host genome. That would probably be a very rare technical error, it would require lots of chimeric reads. If you're worried, you can try to annotate them using ISFinder to see if insertion sequences are being assigned to incorrect reference genomes.


From: Xuanji Li @.> Sent: Tuesday, October 12, 2021 4:26 AM To: bhattlab/MGEfinder @.> Cc: Matt Durrant @.>; Comment @.> Subject: Re: [bhattlab/MGEfinder] Can MGEfinder be used for metagenomics data of infant gut? (#32)

Hi Matt Thank you for the reply! It should be a good idea to lower down the detection threshold and manually check the annotated elements. I will try to see what the result looks like afterward. I am annotating the elements now and detected some intact prophage which probably shows that the results are reliable in some certain content. For metagenomics detection, I am worried that there are some insertion elements that do not belong to the reference genomes. For example, we detect one insertion termini using Facelibacteria as a reference genome and then assign consensus flanks against the assembly of the whole community (for example, including E. coli). There are possibly some insertion sequences from E. coli or other bacteria other than Faceli (for example). I feel in principle, this case should occur. Do you have idea how to avoid it?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bhattlab/MGEfinder/issues/32#issuecomment-940918915, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACNBN4YUUMF3PHT3TPBXRHLUGQLNNANCNFSM5FYGFQBA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

xuanji2017 commented 2 years ago

Hi Matt

I have tested to lower the default parameter as you suggested before which worked very well. Thanks very much. Right now I am thinking about how to set up a threshold to determine that bacteria occur in high frequency or low frequency. Do you have any suggestions about that? In addition, do you have any ideas on how to choose the most suitable reference genome?

durrantmm commented 2 years ago

Hi Xuanji,

I'm not sure about a threshold, but you should be able to get an estimate of the frequency by looking at the number of "readthrough" reads vs. the number of softclipped reads at a given position.

There are many ways you could choose a good reference genome. If you have a time series of data, you should try assembling the initial time points and using those as a reference. Then if new integration events show up at later time points, you should be able to detect them sensitively.

Or you could use a reference genome from a public database. For example, you could try https://gtdb.ecogenomic.org/. You can align reads to these genomes to see which ones are well represented in your metagenomic dataset. Or you could assemble your metagenomes, bin them into MAGs, and then find the closest matches for those MAGs in a database like GTDB.

I hope that helps!

Matt Genome Taxonomy Database - Ecogenomichttps://gtdb.ecogenomic.org/ OUT NOW : A standardized archaeal taxonomy for the Genome Taxonomy Database. Available in Nature Microbiology gtdb.ecogenomic.org


From: Xuanji Li @.> Sent: Wednesday, November 3, 2021 8:43 AM To: bhattlab/MGEfinder @.> Cc: Matt Durrant @.>; Comment @.> Subject: Re: [bhattlab/MGEfinder] Can MGEfinder be used for metagenomics data of infant gut? (#32)

Hi Matt

I have tested to lower the default parameter as you suggested before which worked very well. Thanks very much. Right now I am thinking about how to set up a threshold to determine that bacteria occur in high frequency or low frequency. Do you have any suggestions about that? In addition, do you have any ideas on how to choose the most suitable reference genome?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bhattlab/MGEfinder/issues/32#issuecomment-959497760, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACNBN44FNINLEMZL54PZMQ3UKFKCRANCNFSM5FYGFQBA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.