biocore / emperor

Emperor a tool for the analysis and visualization of large microbial ecology datasets
http://biocore.github.io/emperor/
Other
52 stars 50 forks source link

Continuous color gradients for temporal metadata #722

Open fedarko opened 5 years ago

fedarko commented 5 years ago

Summary

Currently Emperor only allows use of the "Continuous values" color gradients when there are at least two numeric values in the current metadata field.

As a maybe far-out feature request, something that I think would be useful would be extending Emperor to also support continuous colorization for dates (similar to Vega-Lite's temporal scale type). This would be useful for looking at, e.g., the collection_timestamp sample metadata field for time-series datasets (like the full Moving Pictures study).

Rough implementation details and small example

I'd imagine this would involve figuring out the minimum and maximum "date" value for the current metadata category, and then converting all intermediate values to an intermediate fraction.

Here's an example tiny dataset—

Sample ID collection_timestamp
1 2017-01-01
2 2017-01-10 12:34
3 2017-03-01 15:22
4 2016-12-01
5 2017-09-20

We'd set the "minimum" date to be Sample 4 (since no time is specified I guess we'd use 0:00 as the starting point), and the "maximum" date to be Sample 5 (again we could use 0:00 here).

Once we have the extreme dates identified, converting intermediate dates to a gradient should be doable via normal interpolation. This would result in a value (here called gradient_position) that describes the date's position on the "gradient" of the current date-range.

In python, this could be done through something like:

>>> # Starting date
>>> s = datetime.datetime.strptime("2016-12-01", "%Y-%m-%d")
datetime.datetime(2016, 12, 1, 0, 0)
>>> # Ending date
>>> e = datetime.datetime.strptime("2017-09-20", "%Y-%m-%d")
datetime.datetime(2017, 9, 20, 0, 0)
>>> # An intermediate date (from Sample 2 below)
>>> i = datetime.datetime.strptime("2017-01-10 12:34", "%Y-%m-%d %H:%M")
datetime.datetime(2017, 1, 10, 12, 34)
>>> # Figure out the gradient_position value of i's date
>>> # Note that s' gradient_position would be 0, and e's gradient position would be 1
>>> # (as expected)
>>> (i.timestamp() - s.timestamp()) / (e.timestamp() - s.timestamp())
0.13832551083297775

Repeating the process, we'd get the following gradient_position values for every sample:

Sample ID collection_timestamp gradient_position
1 2017-01-01 0.1058
2 2017-01-10 12:34 0.1383
3 2017-05-06 15:22 0.5345
4 2016-12-01 0
5 2017-09-20 1

And colorizing the gradient_position values continuously should be feasible using existing functionality in Emperor.

Challenges in actually implementing this

I'm pretty sure that in practice we'll have to worry about a lot of ugly corner cases—date formatting and parsing can be a hassle, and I'm not familiar enough with JS' date APIs to say how difficult this would be. It may be possible to use another library to do the heavy lifting for us (e.g. Vega, d3-time, or something similar), although of course introducing another dependency might be a hassle.

Also, timestamps where only the date (and not the time) is available will complicate things. Assuming 00:00 as the time is an option, but we should make it clear to the user that that's what's being done.

Alternative solutions

It'd also be feasible to just some preprocessing on the metadata to convert timestamp fields to a time_gradient or whatever field that would be equivalent to the gradient_position thing I described above. This is related to how the Moving Pictures study has a days_since_experiment_start field which is probably good enough for continuous colorization by timestamp (albeit with less granularity, since it's on the order of days).

ElDeveloper commented 5 years ago

I like this idea a lot, I would very much prefer for the parsing/conversion to occur at the interface level i.e. from the UI. I think an element would need to be clicked and the data would be then converted into a continuous ordinal scale.

As for your concerns, introducing a new dependency should be fine so as long it is a library that's well maintained, documented and that works well for us.

Regarding the option to implement this in python: I think that would be nice to "prototype" with for a bit and to play around with, but in practice it would ultimately be more beneficial to allow for the user to decide "on the fly".

I like this idea a lot, let me know and I'll be excited to help with this!

fedarko commented 5 years ago

Thanks! I agree that doing this at the JS level (on the fly) would be ideal—the python code was just the simplest method of demonstrating this I could think of yesterday.

Looks like the JS equivalent of the interpolation would be something like

> var s = new Date(2016, 11, 01);
> var e = new Date(2017, 08, 20);
> var i = new Date(2017, 0, 10, 12, 34);
> console.log((i.getTime() - s.getTime()) / (e.getTime() - s.getTime()));
0.13832551083297775

(For some reason, the months in JS' Date API start at 0 while the days start at 1.)

Of course in practice, we wouldn't have the actual year/month/day/... numerical values to start off with—we'd just have the collection_timestamp field values. It looks like Moment.js would be a good external library for parsing/manipulating these values; it seems well-tested, supports Require.js, and has a ton of parsing functionality that should handle most of the corner cases for us.

Basic workflow with Moment.js

So the user clicks some sort of UI element (maybe something like Continuous temporal values, analogous to the Continuous values checkbox), which would try to create Moment objects for all of the currently selected metadata field values.

If at least two valid Moment objects were created (can check via something like this method?), then we can find the maximum and minimum Moment via Moment.max() and Moment.min(). Then we can convert all of the valid Moment objects to something like milliseconds since the epoch, and then colorize samples accordingly.

(I guess samples whose metadata values weren't parseable as Moments would just be colored a constant color. This would be analogous to non-numeric values for the current continuous colorization settings.)

Let me know if this sounds reasonable. I think this would be a nice thing for us to add in sometime in the future, especially since it looks like it shouldn't be too much work (hopefully :).

ElDeveloper commented 5 years ago

Basic workflow with Moment.js

This sounds like a great plan to me. The only other portion that would be missing is making sure that we can test this workflow (or the components in this workflow). Other than that, everything looks great to me!

Let me know if you would like to chat more about this.