Open ctb opened 4 years ago
the face of contamination...
OK, I can produce these diagrams easily for any two genomes, now, using mashmap.
I'm thinking about -
Thought about this some more. The dotplot and other plots will work for really egregious types of contamination, but they're not super general - they handle certain cases well, like large contigs that are identical, but they don't handle more subtle cases well, like contamination spread throughout.
I suspect that dotplots and/or slope graphs will be part of a good summary, along with the response curve plot (top of this issue) and probably just a straight up alignment/copy paste-able output of the likely contaminated sequence(s). I'm looking at mummer for that.
Oh, and some estimates of ANI would be good, too. For each contaminant, "the query genome shares X bp at Y ANI with subject genome".
and/or highlight contigs where there is high aligned containment in the other genome, e.g. not just aligned sequences but fully aligned contigs.
put stacked dot plots code here -- https://github.com/ctb/2020-stacked-dot-plots
ref #132, #125
some ideas for display!
@bluegenes:
@bluegenes:
@bluegenes: