Quansight / ibis-vega-transform

@vega transforms with @ibis-project expressions
Apache License 2.0
29 stars 7 forks source link

Switch to async trasnform #49

Closed saulshanabrook closed 4 years ago

saulshanabrook commented 4 years ago

This switches to using an async transform, based on the example Jeff added for the load transform.

https://github.com/Quansight/ibis-vega-transform/issues/22#issuecomment-562269409

I have tested this locally and it works with our first example in the performance notebook! The UI is much better, its really smooth clicking between different things and doesn't hold at all between interactions.

TODO

saulshanabrook commented 4 years ago

Wow, this works! I used @jheer's work on the Load transform as a guide. However, I had to add support for cancellation, if the user tries to interact with the graph as it is fetching new data, it should abort the old fetch as soon as possible and throw away the results.

The two "problem" examples now are pretty much workable, from a UI perspective:

2020-01-27 12 59 19 2020-01-27 13 00 22

saulshanabrook commented 4 years ago

Also through the Jaeger tracing here, we can see that a number of transforms were started, but cancelled before executing. They don't have the kernel spans under them. But, once the user stops interacting, they stop being aborted and we get a number of queries that execute:

Screen Shot 2020-01-27 at 1 15 19 PM
saulshanabrook commented 4 years ago

OK I fixed the tracing reporting here.

goanpeca commented 4 years ago

Reviewing!

saulshanabrook commented 4 years ago

Thanks! I noticed something interesting... Take a look at this trace:

Screen Shot 2020-02-26 at 11 49 54 AM

So you see that the browser transforms are happening in parallel. Yay! That's what we want. However you might notice that the actual querying, where it goes back to the kernel, is not. This is because I guess ipython comms are all running in the same thread. Or maybe in different threads and maybe all the ibis execute calls all block the GIL?

I then tried to add some multithreading but couldn't quite get it working.

goanpeca commented 4 years ago

This is because I guess ipython comms are all running in the same thread.Or maybe in different threads and maybe all the ibis execute calls all block the GIL?

Oh dear. Could we ask someone in omsisci about ibis execute calls all block the GIL this assumption?

saulshanabrook commented 4 years ago

Erm, ill do a little more debugging today. Not sure about why it's happening.

saulshanabrook commented 4 years ago

I think this can be merged without the threading logic and I can add that in a follow up.

saulshanabrook commented 4 years ago

OK I added multiprocessing support which resets the client.

Screen Shot 2020-02-27 at 11 21 23 AM
saulshanabrook commented 4 years ago

Eh, it takes too long to open a connection to make opening a new one per query viable:

Screen Shot 2020-02-27 at 11 27 36 AM

So for now I have disabled multiprocessing and we can enable it again if we just create one connection per thread and reuse it.

saulshanabrook commented 4 years ago

Gonna merge this in for now.