biocore / empress

A fast and scalable phylogenetic tree viewer for microbiome data analysis
BSD 3-Clause "New" or "Revised" License
48 stars 31 forks source link

[WIP] select min/max of color gradient #521

Open kwcantrell opened 3 years ago

kwcantrell commented 3 years ago

This PR address #518 by adding the option to select the min/max values for a color gradient. See below example. Screenshot from 2021-05-28 23-15-32

Abstracting the ColorOptionsHandler, which will make is much easier to implement this feature in the feature metadata color gradient PR (#484), makes up the bulk of this pr (the actual change to was a few lines of code). I still need to add some tests/documents but I figure I post this in case anyone wants to try it out.

@jwdebelius this PR should be fully functional. If you cloned the empress repo you can install it by:

git checkout -b kwcantrell-center-colormaps master
git pull https://github.com/kwcantrell/empress.git center-colormaps
pip install -e .
qiime dev refresh-cache
jwdebelius commented 3 years ago

Thank you for this! I'm excited to work with this. Javascript is a language past where I currently am, so I hope it's okay if I give UI kinda feedback.

The good:

Thus far, a few issues I've run into:

I'm going to continue working, and I may check back with more feedback

Again, thank you!

kwcantrell commented 3 years ago

@jwdebelius thank you for taking an interest and trying it out! Your feedback back is definitely appreciated as it will ultimately lead to a better version.

Its weird that I can set min and max values on categorical data. This maybe seems like something to trigger when you have continuous data, or a continuous colormap.

I agree. We should only give the option to set boundaries when using a continuous color map.

The new version shows all my nodes in the tree. If I'm using this for visualization, or exporting, I'd prefer not to have all my nodes displayed. It didn't do this in the library version and I'm not sure if it's an output issue or what.

Hmm...I am not entirely sure why it is defaulting to show all node circles for you as that is not something I am seeing on my end. Perhaps you branched from an old version of the master branch? You can try to pull from master into your branch for center color maps? Regardless, if you go into the settings panel, you should be able to select 'Hide all node circles' from the 'Show node circles' drop down menu.

fedarko commented 3 years ago

Thank you both!

I think the reason for the node circles appearing is that, as Kalen mentioned, we recently (haven't officially released a new version since) updated EMPress so that by default it shows circles for nodes with only one child (https://github.com/biocore/empress/pull/486). Looking at Justine's figure, I think this explanation makes sense -- normally only a few node circles show up due to this setting, but it looks like this particular tree has a lot of one-child nodes and therefore a lot of node circles showing up. Maybe this is the result of the tree being sheared from an insertion tree? I've seen this happen before in that sorta context.

In any case, this functionality will be super useful to have in -- thanks so much!

mortonjt commented 2 years ago

Hi @kwcantrell do you mind if you could resolve the merge conflicts? I would like to push in some edits into this PR, but I'm unable to get the minimal working prototype up and running ...

kwcantrell commented 2 years ago

@mortonjt I will work on this today!

kwcantrell commented 2 years ago

@mortonjt the conflicts are resolved!

mortonjt commented 2 years ago

Thanks!

On Wed, Feb 9, 2022 at 4:11 PM kwcantrell @.***> wrote:

@mortonjt https://github.com/mortonjt the conflicts are resolved!

— Reply to this email directly, view it on GitHub https://github.com/biocore/empress/pull/521#issuecomment-1034195860, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXLREDA7R7HUFVAPPZLU2LJ7RANCNFSM45X7J37Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

jwdebelius commented 2 years ago

I just tried working off this in qiime2-2022.2 (I think Im on the most recent commit) and I cant set limits with negative values

mortonjt commented 2 years ago

The workaround that we ended up using is to hard code the numerical values as categorical values. IMHO it is actually the preferred way to handle color gradients (people can't tell differences between small color shifts right??)

See the phylogenetic tree at https://www.biorxiv.org/content/10.1101/2022.02.25.482050v1.full.pdf

image

jwdebelius commented 2 years ago

Thanks for the suggestion. Im trying to use this for exploratory analysis of a consistent gradient, which makes setting categories apriori kind of obnoxious.

(My current workaround was to add a constant value to essentially shift the center and then set limits between 0 and 2x constant. But, it's still not a great workaround/UI piece

kwcantrell commented 2 years ago

@jwdebelius I am currently in the process of rewriting a lot of the code base. So, this could be a good opportunity to rethink empress' coloring functionality.

exploratory analysis of a consistent gradient, which makes setting categories apriori kind of obnoxious. Would mind elaborating a bit more on this. I would like to modify empress' coloring functionality since it is currently a bit clunky.

mortonjt commented 2 years ago

I agree that getting centering to work would be great.

But I do want to note that discretizing colors in heatmaps is actually the recommended practice for visualization. If you look at any COVID heatmap, all of the reported journals have discretized colors

This one is from NPR

image

And this one is from Mayo clinic image

So when thinking through these use cases, I wouldn't recommend completely throwing out the idea of discretizing continuously value quantities.

kwcantrell commented 2 years ago

Thats a good idea! We can create a UI to easily enable discretizing continuously valued quantities. I think it would also be nice to upload a json that can set different parameters.

antgonza commented 2 years ago

If we are rethinking coloring (which might also help in Emperor); AFAIK both tools share background design ... what about allowing (via the GUI) to split the values in quartiles or deciles?

kwcantrell commented 2 years ago

Thats kinda what I was thinking. We can add a 'buckets' option were users can define how many discrete regions they want and also define the bounds of the regions.

jwdebelius commented 2 years ago

Thanks @kwcantrell! I know it's hard.

I'm trying to build a heatmap of timeseries, which I'd like to visualize on a consistent colorscale. I filtered my data, fit my model over n timepoints, and now I want a heatmap clustered phylogenetically to see if there are clade-related effects in my temporal response. Some of my timepoints (columns) have a small variance, some have a large one (10x bigger). To make the "heatmap" work, I need to color the bars consistently (same colormap, min, max) to let me visually inspect for trends. (I'm essentially looking for an equivalent to the seaborn cluster map functionality that lets me cluster using a phylogenetic tree.)

This is going to guide next visualization/analysis steps. Because it's a quick check in to decide if I need a more complex analysis, I don't want to bother beyond min and max. Zero-centered data will help me understand my trends. (If I see a pattern, then I'll invest in a tool that's harder to install and explain to my clinical collaborator; if I don't then all our lives will be better if I don't have to utter the word "Isometric".)

Its a more general issue, though. As a reviewer, I'd want a continous, zero-centered color map for this data; my clinical collaborators will still want a zero-centered, continous color map for this data. This is not an unreasonable expectation or application on the part of an end-user; not only that, it worked in a previous example (up above).

I think if you're re-factoring color, the ability to group bars with a consistent colormap and colormap parameters would be really helpful since it's an issue I've had in at least 2 projects recently with different variables and interests.

I would also love the ability to pass a custom colormap (I'm never getting those 14 hours in illustrator back) or have a color map deceleration in my metadata file so that when I pass the columns in, empress already knows how to color them.

jwdebelius commented 2 years ago

I dont think automatic descritization would work for what I want to do, unless it's very small quantities. At which point, its a PITA to program, its a PITA to explain to other parties, and there will be a large number of frustrated people on the QIIME 2 forum, which will ultimately lead to less use and adoption.

antgonza commented 2 years ago

I think a feature like Emperor's animation would be beneficial for @jwdebelius use case. Basically, you load all data as a single dataset (same coloring for all time points) but you just show/hide (in an animation) each time point individually. In other words, like this: https://biocore.github.io/emperor/build/html/tutorials/animations.html but hide/show the points. BTW this reminded me of this feature in SitePainter: http://biocore.github.io/SitePainter/documentation/animation.html.

jwdebelius commented 2 years ago

Thanks for the suggestion @antgonza, but an animation would not be a more appropriate visualization than a static heat map. I want to do a consistent comparison over my continuous gradient at a glance. Animation would be less informative in this case than just displaying the bars side-by-side.

There are other places where it might be a cool feature (like in a paired plot) but that's beyond the issue in this PR around wanting to be able to consistently set positive and negative limits on a continous colormap.

fedarko commented 2 years ago

Thanks everyone for the helpful feedback. I find I agree with @jwdebelius' perspective on this: as I understand it, the eventual rewrite of EMPress' codebase will supersede any solution to this issue we come up with now, but—in the meantime—it would be really nice to give users the ability to explicitly set the boundaries of continuous colormaps. This should be feasible in the short term, our other obligations permitting. (And in the long term, it would be nice to have automatic discretization as an additional option, as @mortonjt et al. suggested: I've created a separate issue to track discussion / progress on this in #556.)

I've blocked out a few hours on Thursday to review #484 and tidy up this PR (#521) to make this possible in EMPress in the short term.


One more note: @jwdebelius, you mentioned that

I would also love the ability to pass a custom colormap (I'm never getting those 14 hours in illustrator back) or have a color map deceleration in my metadata file so that when I pass the columns in, empress already knows how to color them.

What sort of format is this custom colormap in? I am guessing that it's one of the three things below (but if it's something else please let me know):

  1. A custom continuous colormap (e.g. "a gradient with #000066 as the low value and #ee1122 as the high value")
  2. A custom categorical colormap (e.g. an alternative to Classic QIIME Colors)
  3. An explicit mapping of categorical values to colors

I don't think we have an issue open for the first two of these, but it might not be too much extra work to add support for them. We already add "custom" discrete and continuous colormaps internally here, and I suspect that generalizing this to support the user uploading / otherwise defining custom colormap(s) should be feasible.

(The third of these things essentially corresponds to #368, and supporting this would require a pretty large amount of work: see #461.)

jwdebelius commented 2 years ago

Thanks @fedarko. This is the third case: #368, or the json case.

fedarko commented 2 years ago

Alright, this PR should now work like before! @jwdebelius / @mortonjt / anyone else interested, would you mind testing this out?

The changes involved were relatively small, but this PR should now work as expected with negative numbers (the fix was applied in commit 70668f2e874c11ec11d93bef54a14ff768cc5ce3). I also adjusted the formatting of the new error messages (e.g. min value ≥ max value; min and/or max values are missing) a bit.

PR status

Screenshots

For reference -- here are screenshots of replicating Fig. 2C from the EMPress paper before and after applying custom boundaries on the Songbird and ALDEx2 differentials:

Before: (default continuous colormap boundaries) before

After: (I set the Songbird differentials' colormap boundaries to [-2.87, 2.87] and the ALDEx2 differentials' colormap boundaries to [-0.87, 0.87]) after3

ElDeveloper commented 2 years ago

Thanks @kwcantrell and @fedarko, these changes look great!


I'm never getting those 14 hours in illustrator back

LOL 😆

ElDeveloper commented 2 years ago

@fedarko or @kwcantrell any chance you can update this branch to pull the latest changes from #555

jwdebelius commented 2 years ago

Thanks @fedarko and @kwcantrell. The recent commit worked for me and I was able to set a consistent colormap with negative colors

fedarko commented 2 years ago

Thanks both! @ElDeveloper, I've merged the upstream changes from #555 into this PR, so the tests should now pass.

However, I think the tree coloring update buttons being broken (documented here) will still prevent this PR from being merged in -- that issue with this PR will need to be fixed first :| As discussed there, maybe the quickest way forward is merging #484 in, then merging this in on top of that?

ElDeveloper commented 2 years ago

Thanks for the clear explanation @fedarko. #484 has some outstanding changes from @kwcantrell for you to review. Would you be able to have a look at those, or would you rather I look over those changes?

fedarko commented 2 years ago

Thanks @ElDeveloper. Getting back to #484 is on my list, but I don't have time to review everything there right now -- I'll set aside some time this Tuesday to look over that. If you have time to look over the changes, that would be great. Sorry for the delay.