Arcadia-Science / sourmashconsumr

Working with the outputs of sourmash in R
https://arcadia-science.github.io/sourmashconsumr/
Other
25 stars 3 forks source link

add functions to visualize and interrogate overall taxonomy results #13

Closed taylorreiter closed 2 years ago

taylorreiter commented 2 years ago

Like are used in the notebook here: https://github.com/Arcadia-Science/2022-prjna853785-sourmash/blob/main/notebooks/20220815-visualize-sourmash-taxonomy-results.ipynb

Visualizations that I think are worth including:

  1. fraction of sample matched/unclassified colored by database a. maybe add a low-confidence portion -- taxonomic matches that had less than 50kb in the entire sample. I could used the paired palette for this -- high confidence bacteria, low confidence bacteria, etc.
  2. upset plot of shared lineages a. would be nice to choose which level of taxonomy this plot is made at
  3. ability to dig into intersections from the upset plots
taylorreiter commented 2 years ago
  1. upset plot of shared lineages a. would be nice to choose which level of taxonomy this plot is made at

addressed in #23

taylorreiter commented 2 years ago
  1. ability to dig into intersections from the upset plots

addressed in #23

taylorreiter commented 2 years ago
  1. fraction of sample matched/unclassified colored by database a. maybe add a low-confidence portion -- taxonomic matches that had less than 50kb in the entire sample. I could used the paired palette for this -- high confidence bacteria, low confidence bacteria, etc.

The first bit is addressed in #30. I tried the second bit when implementing this, and coloring by confidence was underwhelming so I left it out -- I think this is because the low confidence things really account for a super small amount of the sample. Also, by default, the threshold bp for sourmash gather is 50kb, so this feature wouldnt really be used by that many people.