KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes
https://autometa.readthedocs.io
Other
40 stars 15 forks source link

Use of metrics like gene density and kmer content for taxon assignments #115

Open tderond opened 3 years ago

tderond commented 3 years ago

These are just some thoughts, free to ignore if you've considered them already or if I'm misunderstanding how autometa does taxon assignments

If I understand it right, currently Autometa uses some form of consensus of the originating organisms of Diamond nr search hits to do taxon assignments for contigs.

Could it maybe help to consider other metrics of the contig in this process? Might be useful if there's not a lot of genes on the contig that you can search for. I know that in the binning process, kmer content is used, but I don't think it is for taxon assignment? I know that the Kwan lab has thought a lot about how quickly kmer content adapts to a new hosts after horizontal gene transfer. What about gene density? E.g., In my (flawed) experience the gene density of virusses>most bacteria>cyanobacteria>protists>animals.

jason-c-kwan commented 3 years ago

Sorry I took such a long time to reply to this! I've certainly thought about this because sometimes you get a random bacterial gene on an obviously eukaryotic contig, and thus low density could be useful. However, I think we would have to figure out a nuanced way to use it because there are plenty of bacterial symbionts that have low gene density. Having said that, I think we could probably add the coding density to output tables easily enough since we run prodigal on the contigs anyway.