biocore / empress

A fast and scalable phylogenetic tree viewer for microbiome data analysis
BSD 3-Clause "New" or "Revised" License
46 stars 31 forks source link

Barplots in the circular / rectangular layout #201

Closed fedarko closed 4 years ago

fedarko commented 4 years ago

This was mentioned in #97 (which has since been closed, since the focus of that was on the circular layout).

Now that the circular layout is implemented and tested, supporting visualizing tip-level feature metadata as barplots would be a really cool feature to add. This could be useful for a few different types of feature metadata, ranging from Songbird/ALDEx2/... differentials (or other "importance scores") to taxonomy annotation confidence values, etc.

fedarko commented 4 years ago

Also, it'd be cool to optionally support visualizing information passed over from Emperor as barplots -- it could be really useful to see e.g. presence information as tip-level information, while maintaining previous coloring of the tree (e.g. by feature metadata). Biologically, this would be a way of showing what particular taxa are unique to which groups of selected samples, or something along those lines.

fedarko commented 4 years ago

From doing some planning, I think there are three types of barplots that would be good to work on supporting (and potentially more if requested):

  1. Assign each tip a bar of fixed length, and alternate the colors of the bars based on a feature metadata field. These could be either categorical colors (e.g. taxonomy annotations) or quantitative colors (e.g. Songbird/ALDEx2/etc. differential values, other types of feature importance scores as suggested by @shihuang047, etc.).

    Example: The "Host Class" ring in Fig. 1 of Song/Sanders et al. --

    prettytree

  2. Assign each tip a bar of fixed color, and alternate the lengths of the bars based on a (quantitative) feature metadata field.

    Example: The relative abundance barplots in Fig. 2A of Baker et al. (not exactly comparable b/c this barplot has more than one category, but the same general idea) --

    btree

  3. Assign each tip a bar of fixed length, and draw a stacked barplot based on this tip's sample presence information for a selected sample metadata field. (To give an idea of what this would look like, for "body site" in the moving pictures dataset, tips unique to gut samples would have a completely red bar; tips split 50/50 between left and right palm samples would have a half blue / half orange bar; and so on.)

    Example: The "Diet" ring in Fig. 1 of Song/Sanders et al., see above

I imagine these are ranked roughly in order of how useful they'll be (maybe 3 and 2 could be switched around, though). So IMO it makes sense to start with the first type of barplot. (Happily, I think this will also be the easiest of the three to implement :)

Other considerations

ElDeveloper commented 4 years ago

Thanks for breaking this down @fedarko, very helpful. After thinking about this for a little bit, here's some thoughts. I had to think of it in terms of features and samples:

For drawing the bars, I think using shaders will be the most performant solution. I think addressing #214 should help us get startecd.

In both cases it sounds like we should allow to have multiple rings of information. In any case, I agree that we should start with the case that's easier to implement and move from there.


I agree 🎩-tip to iTOL and other tools like Anvio, ggtree, FigTree, Topiary Explorer, and so many more ✨