Adds data, data provenance, and data documentation for example data that will be loaded with the package. I did my very best to follow all of the conventions laid out here: https://r-pkgs.org/data.html

Unfortunately, example data can't be more than 1M in size if we want to upload the package to CRAN one day (which I think is a good goal to have). Also, I wanted to make sure I chose something where I completely documented the generation of the data (all of the Arcadia-specific samples I'm working with right now, I haven't run gather on the raw reads but instead on assemblies, and I don't have the exact code and versions use to get the raw reads to their assemblies). For that reason, I chose the gut data set that I used here: https://taylorreiter.github.io/2022-07-28-From-raw-metagenome-reads-to-phyloseq-taxonomy-table-using-sourmash-gather-and-sourmash-taxonomy/. The only thing this won't show super well is the time series alluvial plot. For that, I'll add subsampled data from a time series later (I have about 100kb to work with until I reach my 1Mb limit for data).

I added both extdata so that I can show the read* functions and data objects so people can work with those directly and see what their formats look like without having to use a read* function.

In subsequent PRs, I'll use these data to write examples for each function and to write a vignette.

Codecov Report

Base: 87.09% // Head: 91.41% // Increases project coverage by +4.31% :tada:

Coverage data is based on head (039b8f3) compared to base (c330974). Patch coverage: 93.67% of modified lines in pull request are covered.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #42 +/- ## ========================================== + Coverage 87.09% 91.41% +4.31% ========================================== Files 7 8 +1 Lines 403 524 +121 ========================================== + Hits 351 479 +128 + Misses 52 45 -7 ``` | [Impacted Files](https://codecov.io/gh/Arcadia-Science/sourmashconsumr/pull/42?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arcadia-Science) | Coverage Δ | | |---|---|---| | [R/plot\_taxonomy\_annotate.R](https://codecov.io/gh/Arcadia-Science/sourmashconsumr/pull/42/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arcadia-Science#diff-Ui9wbG90X3RheG9ub215X2Fubm90YXRlLlI=) | `93.00% <91.26%> (+16.38%)` | :arrow_up: | | [R/plot\_gather.R](https://codecov.io/gh/Arcadia-Science/sourmashconsumr/pull/42/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arcadia-Science#diff-Ui9wbG90X2dhdGhlci5S) | `98.18% <98.18%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arcadia-Science). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arcadia-Science)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

Arcadia-Science / sourmashconsumr

Add example data to package and document the data generation and contents #42

Codecov Report