biocore / empress

A fast and scalable phylogenetic tree viewer for microbiome data analysis
BSD 3-Clause "New" or "Revised" License
48 stars 31 forks source link

Use ancestor information in taxonomy feature metadata (only computing ancestor info in the JS interface) #487

Closed fedarko closed 3 years ago

fedarko commented 3 years ago

This is an alternate version of #482 that avoids dramatically increasing the size of the taxonomy information stored in the QZV -- instead, each entry in the feature metadata is "expanded" as needed in the JS. This means that in the worst case (visualizing the lowest level of a taxonomy, e.g. "Level 7" / species) we just need to stitch back together the original taxonomy for every tip described in the feature metadata (this process could be made more efficient -- see comment added in a95b652).

This works by passing a list of the "split taxonomy columns", in descending rank order (i.e. Level 1, Level 2, ...) from the Python to the JS code, which (in certain contexts) uses this list to determine how much if any ancestor data to get for a given feature metadata column. Doing things in this way may make life easier in the future, since it'll let us name the taxonomy columns (after creating them) whatever we want, so we can do things like use 0-indexing / use a different term than Level / etc. if desired without breaking this functionality.

As with #482, this closes #473 and closes #422.

fancy

fedarko commented 3 years ago

One thing worth noting: the taxonomy information shown for biplot arrows is still stored as before, so now the two representations are different:

cornercase

I don't think this is a big problem, esp. since the colors were already different between biplot arrows and EMPress' assigned colors (since biplot arrows are [usually] a subset of the tips in the tree), but it may be worth trying to address this in a future issue or something (maybe doing the unwieldy Python metadata operation in #482, but just for the biplot arrow taxonomies...?)

ElDeveloper commented 3 years ago

@fedarko, could you resolve conflicts? @kwcantrell can you take another look at this PR and merge if it looks good to you?

fedarko commented 3 years ago

Thanks, just resolved the conflict; PR should be ok to merge pending approval from @kwcantrell.

After barplot clicking, the next thing I have on my plate for EMPress is #371 (drawing only the top K most frequent values in a field) -- as Yoshiki suggested, I think that will work really well with this PR's changes.

emperor-helper commented 3 years ago

The following artifacts were built for this PR: empire-biplot.qzv, empire.qzv, empress-tree.qzv, just-fm.qzv, plain.qzv

ElDeveloper commented 3 years ago

Thanks so much @fedarko!

On (Mar-30-21|14:30), Marcus Fedarko wrote:

Thanks, just resolved the conflict; PR should be ok to merge pending approval from @kwcantrell.

After barplot clicking, the next thing I have on my plate for EMPress is #371 (drawing only the top K most frequent values in a field) -- as Yoshiki suggested, I think that will work really well with this PR's changes.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://urldefense.com/v3/__https://github.com/biocore/empress/pull/487*issuecomment-810590583__;Iw!!Mih3wA!QbAwxX-sKZ5BEx4jz39Guti_-OzcXaDrco44iIpdXgygHpK6VEcEgFTOnABfMAM$

ElDeveloper commented 3 years ago

@kwcantrell any chance you can look at this PR, it's pending your approval so feel free to merge after you approve. 👍

kwcantrell commented 3 years ago

Thanks @fedarko! The extra run time is not ideal but this method represents the correct way to display taxonomy and once #371 is merged in, the run time wont be that big of an issue.

ElDeveloper commented 3 years ago

Thanks so much @fedarko and @kwcantrell !