Use of metrics like gene density and kmer content for taxon assignments

KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes

Other

40 stars 15 forks source link

These are just some thoughts, free to ignore if you've considered them already or if I'm misunderstanding how autometa does taxon assignments

If I understand it right, currently Autometa uses some form of consensus of the originating organisms of Diamond nr search hits to do taxon assignments for contigs.

Could it maybe help to consider other metrics of the contig in this process? Might be useful if there's not a lot of genes on the contig that you can search for. I know that in the binning process, kmer content is used, but I don't think it is for taxon assignment? I know that the Kwan lab has thought a lot about how quickly kmer content adapts to a new hosts after horizontal gene transfer. What about gene density? E.g., In my (flawed) experience the gene density of virusses>most bacteria>cyanobacteria>protists>animals.

KwanLab / Autometa

Use of metrics like gene density and kmer content for taxon assignments #115