Closed justinmaire closed 4 years ago
Hello, It does look very large genomes indeed. As a first sanity check, I would recommend running CheckM (https://ecogenomics.github.io/CheckM/) on your dataset to know the Completeness and Contamination of your assemblies. If those genomes are highly contaminated, it will affect the GTDB-Tk classification.
Hi Justin, If it's possible, are you able to provide the genomes? I'm keen to explore how GTDB-Tk behaves with them. Thanks!
Thanks for your answers! I did run CheckM and as expected they're contaminated. Just a quick question on that: which threshold do you use for the contamination score to confidently say 'this genome is contaminated'? My contaminated ones are all above 100 and the detailed score clearly show contamination, all the other ones are between 0 and 2, but I have one genome that sits at 18 and has got a few markers showing up twice, so I was just wondering what your thoughts were on those intermediate scores?
Aaron, I'd be more than happy to provide those genomes yes. I've got 8, I'm not sure what the easiest way to do this? (technologically-challenged person here!)
Hello Justin, Currently, we recommend running GTDB-Tk on genomes estimated to be ≥50% complete with ≤10% contamination consistent with community standards ( https://www.nature.com/articles/nbt.3893 ).
Ticket closed due to inactivity.
Hi!
Not really a problem here, just a curiosity question: I applied GTDB-Tk on 50 bacterial genomes, and a few of them were returned as archeal genomes, which was weird cause those bacteria had previously been characterized through bacterial 16S primers, so I was quite sure they were bacteria and not archea. After looking more closely, it turned out those genomes were abnormally large (8-12Mb), so I put them in a metagenome analysis tool (MG-RAST) which revealed, as expected, that those specific genomes were mixed colonies (two or three different species in general), but bacterial species nonetheless. Any idea why they were classified as archea? Did GTDB go all crazy cause it found every marker in double/triple?
Thanks! Justin