Arcadia-Science / sourmashconsumr

Working with the outputs of sourmash in R
https://arcadia-science.github.io/sourmashconsumr/
Other
21 stars 3 forks source link

Plot the outputs of `sourmash gather` #30

Closed taylorreiter closed 1 year ago

taylorreiter commented 1 year ago

Inspired by plots in this notebook: https://github.com/Arcadia-Science/2022-prjna853785-sourmash/blob/main/notebooks/20220815-visualize-sourmash-taxonomy-results.ipynb

These plots don't use taxonomy information at all. If color is added, it's for the database the genome accessions came from.

plot_gather_classified() looks like this: image

And the upset plot either looks like this: image

or like this: image

codecov-commenter commented 1 year ago

Codecov Report

Base: 87.09% // Head: 78.00% // Decreases project coverage by -9.09% :warning:

Coverage data is based on head (818d050) compared to base (6ebc1db). Patch coverage: 0.00% of modified lines in pull request are covered.

:exclamation: Current head 818d050 differs from pull request most recent head 09e578e. Consider uploading reports for the commit 09e578e to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #30 +/- ## ========================================== - Coverage 87.09% 78.00% -9.10% ========================================== Files 7 8 +1 Lines 403 450 +47 ========================================== Hits 351 351 - Misses 52 99 +47 ``` | [Impacted Files](https://codecov.io/gh/Arcadia-Science/sourmashconsumr/pull/30?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arcadia-Science) | Coverage Δ | | |---|---|---| | [R/plot\_gather.R](https://codecov.io/gh/Arcadia-Science/sourmashconsumr/pull/30/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arcadia-Science#diff-Ui9wbG90X2dhdGhlci5S) | `0.00% <0.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arcadia-Science). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arcadia-Science)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

taylorreiter commented 1 year ago

The new code cov results didn't make there way into the comments from CI, but the test coverage is higher after my last additions where I added the tests to the functions in this PR!

taylorreiter commented 1 year ago

LGTM. More for my clarification:

1. The bar plot colors shows that for a sample the weighted abundance that is represented by that database?

Yes exactly!

2. The upset plot is the similarity or overlap of different samples? This was a little confusing with the color overlaid but I should dig into the notebooks that inspired these to maybe get more context for myself.

Ya this one is overlap in the genomes identified in the different samples. The colors are optional...usually, the colors would make more sense if the databases were, say, all 5 of the genbank databases (fungi, bacteria, protozoa, viruses, and archaea). But my test data weren't labelled like that so it was a little random 😂

I'm planning to write a vignette for all of these functions/use cases to summarize them in one place, but I've been prioritizing writing the functionality so we can use it in e.g. workflows if we want to.