Vega, Datashader, and Holoviews Collaboration

saulshanabrook commented 5 years ago

We had a call a few weeks ago with @jbednar @tonyfast @dharhas @philippjfr to discuss different ways datashader and holoviews could be useful to the work we are doing with Omnisci. I was particularly interested in all the work that has been put into creating these interactive rasterized geospatial plots (NYC Taxi example) could be reused for our current work getting interactive vega visualizations to execute on a python backend.

My takeaway from the conversation is that datashader is all about taking some data and rasterizing it. If we wanna think of this in terms of transformations on the data, it is like doing a groupby by pixel and then displaying some aggregate.

And Holoviews, despite its name, is at its core not about viewing data, but about transforming it. The key idea is to maintain enough semantic knowledge about the data as we transform it so that appropriate visualizations are implicit in the data encoding.

So if we think about holoviews as a way of transforming data, with datashader being one particular type of transform that is heavily optimized, then we can see where this can fit in our current pipeline. What do we use currently for transforming data? We take Vega transforms and map them to ibis expressions. Instead, we could take vega transforms and map them to holoviews calls. So holoviews wouldn't be used at all on the frontend for visualizing, it would just be a backend library to do the appropriate transforms, which vega would call out to when it needed to transform data. If we wanted to use our existing pipeline directly, we could try to write an Ibis backend for holoviews. However, there might be too much impedance mismatch between the grammer of ibis and that of holoviews, so instead we could write a different python backend for vega, that transforms directly to holoviews instead of ibis.

What would be the payoff here? Well, users would get to use the Altair API to construct interactive visualizations. And they would get the efficiency built into datashader for computing rasterizing data.

The next steps here would be to explore how vega transforms like groupbys and aggregates could be mapped to datashader. Before that, we should come up with a particular use case for interactive visualization with datashader and holoviews, try to replicate it with altair, and then see how we would map the vega transforms to the holoviews expressions.

Taking a step back, what we are doing here is mapping one domain specific language, Vega transforms, to another, Holoviews operations.

cc @ian-r-rose @domoritz

domoritz commented 5 years ago

That sounds pretty interesting. I don't know enough about datashader and holoviews to say anything smart before I take a closer look at their models.

saulshanabrook commented 5 years ago

I don't know enough about datashader and holoviews to say anything smart before I take a closer look at their models.

I am just starting to look into them, so if I got any of my summary wrong I would appreciate being corrected by any of the authors.

domoritz commented 5 years ago

Could you clarify how you think holoviews could be translated to SQL for omnisci?

saulshanabrook commented 5 years ago

Could you clarify how you think holoviews could be translated to SQL for omnisci?

I don't think it would be. I think for this to help omnisci directly, there would have to be a omnisci backend to datashader. I am not sure how that would work. Possibly as some UDFs that run on their server, but I would defer to the datashaders devs, if they have done any work running on existing databases.

This would be helpful for other datashaders backends. Like in the example, it is running off of a Parquet file.