Closed fedarko closed 4 years ago
Also, it'd be cool to optionally support visualizing information passed over from Emperor as barplots -- it could be really useful to see e.g. presence information as tip-level information, while maintaining previous coloring of the tree (e.g. by feature metadata). Biologically, this would be a way of showing what particular taxa are unique to which groups of selected samples, or something along those lines.
From doing some planning, I think there are three types of barplots that would be good to work on supporting (and potentially more if requested):
Assign each tip a bar of fixed length, and alternate the colors of the bars based on a feature metadata field. These could be either categorical colors (e.g. taxonomy annotations) or quantitative colors (e.g. Songbird/ALDEx2/etc. differential values, other types of feature importance scores as suggested by @shihuang047, etc.).
Example: The "Host Class" ring in Fig. 1 of Song/Sanders et al. --
Assign each tip a bar of fixed color, and alternate the lengths of the bars based on a (quantitative) feature metadata field.
Example: The relative abundance barplots in Fig. 2A of Baker et al. (not exactly comparable b/c this barplot has more than one category, but the same general idea) --
Assign each tip a bar of fixed length, and draw a stacked barplot based on this tip's sample presence information for a selected sample metadata field. (To give an idea of what this would look like, for "body site" in the moving pictures dataset, tips unique to gut samples would have a completely red bar; tips split 50/50 between left and right palm samples would have a half blue / half orange bar; and so on.)
Example: The "Diet" ring in Fig. 1 of Song/Sanders et al., see above
I imagine these are ranked roughly in order of how useful they'll be (maybe 3 and 2 could be switched around, though). So IMO it makes sense to start with the first type of barplot. (Happily, I think this will also be the easiest of the three to implement :)
We would ideally allow for users to select multiple "layers" of barplots, which would allow for intricate displays as shown in the Song/Sanders et al. tree above.
Barplots should work with either circular or rectangular layouts, since both of these guarantee that tips will be allocated some space to themselves in a consistent way (... if that makes sense, there's probably a more elegant way to phrase that).
All of the figures above (and probably like 95% of the tree figures I've seen while working in bioinformatics, let's be real) use iTOL, so we should of course cite iTOL in the code, paper, etc. as the inspiration for this functionality.
Thanks for breaking this down @fedarko, very helpful. After thinking about this for a little bit, here's some thoughts. I had to think of it in terms of features and samples:
Feature metadata bars:
Sample metadata bars:
For drawing the bars, I think using shaders will be the most performant solution. I think addressing #214 should help us get startecd.
In both cases it sounds like we should allow to have multiple rings of information. In any case, I agree that we should start with the case that's easier to implement and move from there.
I agree 🎩-tip to iTOL and other tools like Anvio, ggtree, FigTree, Topiary Explorer, and so many more ✨
This was mentioned in #97 (which has since been closed, since the focus of that was on the circular layout).
Now that the circular layout is implemented and tested, supporting visualizing tip-level feature metadata as barplots would be a really cool feature to add. This could be useful for a few different types of feature metadata, ranging from Songbird/ALDEx2/... differentials (or other "importance scores") to taxonomy annotation confidence values, etc.