Open fedarko opened 5 years ago
I like this idea a lot, I would very much prefer for the parsing/conversion to occur at the interface level i.e. from the UI. I think an element would need to be clicked and the data would be then converted into a continuous ordinal scale.
As for your concerns, introducing a new dependency should be fine so as long it is a library that's well maintained, documented and that works well for us.
Regarding the option to implement this in python: I think that would be nice to "prototype" with for a bit and to play around with, but in practice it would ultimately be more beneficial to allow for the user to decide "on the fly".
I like this idea a lot, let me know and I'll be excited to help with this!
Thanks! I agree that doing this at the JS level (on the fly) would be ideal—the python code was just the simplest method of demonstrating this I could think of yesterday.
Looks like the JS equivalent of the interpolation would be something like
> var s = new Date(2016, 11, 01);
> var e = new Date(2017, 08, 20);
> var i = new Date(2017, 0, 10, 12, 34);
> console.log((i.getTime() - s.getTime()) / (e.getTime() - s.getTime()));
0.13832551083297775
(For some reason, the months in JS' Date API start at 0 while the days start at 1.)
Of course in practice, we wouldn't have the actual year/month/day/... numerical values to start off with—we'd just have the collection_timestamp
field values. It looks like Moment.js would be a good external library for parsing/manipulating these values; it seems well-tested, supports Require.js, and has a ton of parsing functionality that should handle most of the corner cases for us.
So the user clicks some sort of UI element (maybe something like Continuous temporal values
, analogous to the Continuous values
checkbox), which would try to create Moment
objects for all of the currently selected metadata field values.
If at least two valid Moment
objects were created (can check via something like this method?), then we can find the maximum and minimum Moment
via Moment.max()
and Moment.min()
. Then we can convert all of the valid Moment
objects to something like milliseconds since the epoch, and then colorize samples accordingly.
(I guess samples whose metadata values weren't parseable as Moment
s would just be colored a constant color. This would be analogous to non-numeric values for the current continuous colorization settings.)
Let me know if this sounds reasonable. I think this would be a nice thing for us to add in sometime in the future, especially since it looks like it shouldn't be too much work (hopefully :).
Basic workflow with Moment.js
This sounds like a great plan to me. The only other portion that would be missing is making sure that we can test this workflow (or the components in this workflow). Other than that, everything looks great to me!
Let me know if you would like to chat more about this.
Summary
Currently Emperor only allows use of the "Continuous values" color gradients when there are at least two numeric values in the current metadata field.
As a maybe far-out feature request, something that I think would be useful would be extending Emperor to also support continuous colorization for dates (similar to Vega-Lite's
temporal
scale type). This would be useful for looking at, e.g., thecollection_timestamp
sample metadata field for time-series datasets (like the full Moving Pictures study).Rough implementation details and small example
I'd imagine this would involve figuring out the minimum and maximum "date" value for the current metadata category, and then converting all intermediate values to an intermediate fraction.
Here's an example tiny dataset—
We'd set the "minimum" date to be Sample 4 (since no time is specified I guess we'd use 0:00 as the starting point), and the "maximum" date to be Sample 5 (again we could use 0:00 here).
Once we have the extreme dates identified, converting intermediate dates to a gradient should be doable via normal interpolation. This would result in a value (here called
gradient_position
) that describes the date's position on the "gradient" of the current date-range.In python, this could be done through something like:
Repeating the process, we'd get the following
gradient_position
values for every sample:0.1058
0.1383
0.5345
0
1
And colorizing the
gradient_position
values continuously should be feasible using existing functionality in Emperor.Challenges in actually implementing this
I'm pretty sure that in practice we'll have to worry about a lot of ugly corner cases—date formatting and parsing can be a hassle, and I'm not familiar enough with JS' date APIs to say how difficult this would be. It may be possible to use another library to do the heavy lifting for us (e.g. Vega, d3-time, or something similar), although of course introducing another dependency might be a hassle.
Also, timestamps where only the date (and not the time) is available will complicate things. Assuming 00:00 as the time is an option, but we should make it clear to the user that that's what's being done.
Alternative solutions
It'd also be feasible to just some preprocessing on the metadata to convert timestamp fields to a
time_gradient
or whatever field that would be equivalent to thegradient_position
thing I described above. This is related to how the Moving Pictures study has adays_since_experiment_start
field which is probably good enough for continuous colorization by timestamp (albeit with less granularity, since it's on the order of days).