dib-lab / charcoal

Remove contaminated contigs from genomes using k-mers and taxonomies.
Other
52 stars 1 forks source link

notes for documentation - bigger databases => better?, impact of lateral gene transfer/phage #100

Open ctb opened 4 years ago

ctb commented 4 years ago

in theory, as we sequence more and more microbial genomes, charcoal should become better and better (balanced a bit by database size and the potential need to dereplicate through species clusters)

it's not clear to me that Reason 2 is a great idea based on challenges of lateral gene transfer and phage. I guess at the least it will highlight places people should check their genomes?

ctb commented 4 years ago

although note that reason 2 and 3 look at majority lineage, so the entire contig has to be questionable. hmm.

taylorreiter commented 4 years ago

ah interesting note about majority lineage. This would/should still cause problems with plasmids and with small contigs that are dominated by phage or HGT.

I like the idea of saying "check your genomes." I sort of view the *dirty.fa.gz file as either 1) clear contaminants, or 2) contigs that need curation by the user and clear evidence to be re-added to the genome.