Arcadia-Science / sourmashconsumr

Working with the outputs of sourmash in R
https://arcadia-science.github.io/sourmashconsumr/
Other
21 stars 3 forks source link

experimental LIN taxonomy integration #73

Open bluegenes opened 1 year ago

bluegenes commented 1 year ago

[note: this is an experimental/draft PR and should not be merged as-is]

ref #72

In sourmash taxonomy, we're adding utils to use the LIN taxonomic framework, which allows for greater flexibility and specificity compared with standard taxonomic ranks. For example, if only certain strains of a microbe are pathogenic, the LIN framework may be useful for identifying/grouping pathogenic vs non-pathogenic strains. Is this something you're interested in allowing for viz?

We had a question about whether sourmashconsumr would work with LIN lineages for e.g. sankey plots, so I decided to experiment a little to see how easy/hard it would be to allow LIN functionality.

This PR has lins semi working for:

plot using sourmash test data from the lins PR (tests/test-data/tax/test1.gather.csv annotated with tests/test-data/tax/test.LIN-taxonomy.csv):

image

Challenges and thoughts

I am happy to work on this further or drop it, if this isn't something you want to allow!