altair-viz / altair-transform

Evaluation of Vega-Lite transforms in Python
MIT License
69 stars 8 forks source link

Using Big Data with Altair #2

Open saulshanabrook opened 5 years ago

saulshanabrook commented 5 years ago

I wanted to reach out, since we have been working on a similar project over at https://github.com/Quansight/jupyterlab-omnisci.

The goal there is to let users create Altair charts and have the heavy lifting transparently executed on a database.

To get a feel for it, you can open the notebooks/Ibis + Altair + Extraction.ipynb notebook in Binder. If you run the cells, the graphs should appear.

We are using Ibis to build up the SQL expression. We are building it to execute on an OmniSci database, but most of the work should translate to any other Ibis backend.

Currently, we update the Vega Lite spec to take out the transforms and map them to Ibis. So we are implementing a very limited version of what you have here, targeting Ibis instead of Pandas, and using the extracted transforms in the VL spec.

However, our next goal is to support interactions, so that after a user interacts, a new query is computed and run. To do this, we are looking to switch from processing the Vega Lite spec to using the underlying Vega spec or graph. The idea being, we take the initial Altair chart, generate Vega Lite, convert to Vega, then pre-process the Vega spec to turn some of the transforms into a custom transform that will run the query using Ibis back on the kernel. We are tracking that here: https://github.com/Quansight/jupyterlab-omnisci/issues/54

On the Python side, that would involve somehow taking an existing Vega dataflow graph or Vega spec and understanding how those operations map to Ibis expressions. It seems that task shares a lot in common with what you have implemented here.

Like I said, although this work initially targets OmniSci, and their database is particular suited to computing these types of analytic queries, I hope that the general approach will be useful generally for using Altair in Python with other data sources on the kernel, like Pandas dataframes or other databases.

I would be happy to collaborate on any part of this that you would like or get your feedback on your general approach and understand if you have thoughts on how to support this kind of use case on top of Altair.

Also, thank you for helping to maintain this repo!

It's a treat to be able to use the UX in Altair to create large scale visualizations.

jakevdp commented 5 years ago

Sounds good! I'm glad to hear you're working on that. Have you tackled evaluation of vega expressions yet? That was probably >50% of the effort that went into this repo, and I imagine it would be pretty directly applicable to what you're doing with Ibis. You could use the vegaexpr AST from here, but write an Ibis-focused set of node visitors in place of the pandas-focused node visitors used in this repo.

saulshanabrook commented 5 years ago

Have you tackled evaluation of vega expressions yet?

Nope!

That was probably >50% of the effort that went into this repo, and I imagine it would be pretty directly applicable to what you're doing with Ibis. You could use the vegaexpr AST from here, but write an Ibis-focused set of node visitors in place of the pandas-focused node visitors used in this repo.

That's good to know. When we hit that point, I will come back to this to see how I can reuse the work here.

ValdarT commented 4 years ago

Very glad to see there is work being done on pushing the aggregations down to Pandas. Perhaps the koalas library would be a relatively simple way of additionally supporting PySpark and hence bigger than RAM datasets? Altair is fantastic and this would be great for creating histograms, heatmaps and the like for summarising big data.

jakevdp commented 4 years ago

Over the past couple days I pushed a number of big updates to the package. Take a look at This Example in the README which shows how the package can be used to help Altair visualize larger datasets.

saulshanabrook commented 4 years ago

@jakevdp Nice!

We have moved our work to https://github.com/quansight/ibis-vega-transform to make it agnostic to omnisci. If you start exploring how to do this for interactive chars, we should chat to see if we can collaborate. We ended up writing a custom Vega transform that handles calling back to python to get new data, and we compile the original vega spec into a new one that uses that transform.