dib-lab / charcoal

Remove contaminated contigs from genomes using k-mers and taxonomies.
Other
52 stars 1 forks source link

explore protein-based decontamination #119

Open ctb opened 4 years ago

ctb commented 4 years ago

this is not a "soon" issue, but there appears to be substantial opportunity for using amino acid k-mers to find contamination...

e.g. https://github.com/bluegenes/2020-gtdb-smash/issues/1

ctb commented 4 years ago

trying this out now @bluegenes request, over in #120

ctb commented 4 years ago

if we're serious about this, should probably plan on running prokka to extract proteins. or maybe six-frame translation of DNA is better, b/c could catch fragmented genes w/o reducing specificity?