Save rendered SigPlot images with the notebook #26

Open sterre opened 5 years ago

We've talked about this some in person and on Slack. This issue is just trying to capture some of what we've discussed, with no strong organization.

Currently, when a saved notebook is re-opened, SigPlot widgets do not reliably show a rendered image without re-evaluating the generating cell and all dependencies. This is especially vexing in cases like nbviewer or GitHub / GitLab, and is likely a showstopper if the original data is no longer available.

There's some nascent logic in the extension around done and imageOutput that looks like it wants to capture a png from SigPlot and save it to the client for rich representation. This seems like a solid approach, with the only question being how to make that PNG repr display at the right time. It's possible that widgets and rich representation don't mix--this from a very quick experiment where I tried to add an HTML representation to the hello world widget.

Libraries like Matplotlib/Seaborn and Bokeh seem to address this by using a Javascript rich representation instead of a bona fide widget. I make this claim based on observing what's saved with a notebook containing figures from each library.

Matplotlib with %matplotlib notebook generates a Javascript and image representation. On load, the image is displayed until the cell is re-evaluated.
Matplotlib with %matplotlib inline just generates an image repr.
(Seaborn is a wrapper around Matplotlib, and its save semantics are the same.)
Bokeh generates a Javascript representation that appears to load the required library from a CDN (I believe it can also be inlined), plus a JSON representation that the Javascript code renders.

I thought D3 might be a reasonable analog to SigPlot, so went looking for some examples of D3 in a notebook. Here's what I found. None of these is as complete as we might like.

It appears that Javascript reprs take precedence over other rich reprs. This may depend on whether the notebook is trusted. If you return None from a _repr_* function, that repr is not used, which could potentially allow us to wait until a PNG was available before rendering it.

Whatever the representation in the saved notebook, it needs to deal gracefully with very large input data. Matplotlib and Bokeh do this by serializing the figure instead of the data. (There's a size inflation for small data sets, but a big saving on larger data.) A DataShader-style approach may also be relevant.

JupyterLab has a different extension model, and also restricts Javascript content.

@sterre

Currently, when a saved notebook is re-opened, SigPlot widgets do not reliably show a rendered image without re-evaluating the generating cell and all dependencies.

I'm going to generalize your point for a second to all related use cases and pose them as questions:

What should happen when you close a notebook and re-open?
What should happen when you export a notebook to HTML?
What should happen when you export a notebook to PDF?

Aside from (3), where the answer is export to PNG in a similar vein as matplotlib, (1) and (2) require more thought.

As a first pass, I think (1) and (2) should simply be export to PNG, which you referenced in the below paragraph. The final path will require more thought -- how do you preserve an interactive SigPlot with a large amount of data? Should you bother writing out that multi-MB HTML file? What do you do when the original resource is not available?

There's some nascent logic in the extension around done and imageOutput that looks like it wants to capture a png from SigPlot and save it to the client for rich representation. This seems like a solid approach, with the only question being how to make that PNG repr display at the right time. It's possible that widgets and rich representation don't mix--this from a very quick experiment where I tried to add an HTML representation to the hello world widget.

Yep, I thought our intern had a working PNG export this past summer, but it looks like it might've been lost in a branch or something.

This is especially vexing in cases like nbviewer or GitHub / GitLab, and is likely a showstopper if the original data is no longer available.

Let me know if I'm misrepresenting your point here, but I'm reading this as: if a user plots from a websocket or an href, there is an expectation of persistence (even if the original resource is no longer available). (For an href, this should come in the form of a downloaded file.)

If I've interpreted your point correctly, let's take a step back and discuss expected and reasonable use cases of Jupyter Notebook. I have always observed Jupyter Notebook used as a "playground" -- i.e., an area to do exploratory data analysis or an area to begin prototyping a capability -- or as a pedagogical tutorial builder/interactive documentation. In all of these cases, any data used in the notebook should reside in the notebook environment/directory.

If this is the case, perhaps there's a use case of which I'm unaware. (cc @maihde)

I agree, these are the right questions to ask.

I think the additional use cases that I've observed, and that potential users have expressed to me, include:

Analytic Logs
Reports

In both cases, I notice that a primary activity is reading as opposed to exploratory analysis:

log: Here's what I was looking at yesterday / last week / last month / last year
log: What did that signal look like again?
log/report: Here's what I did and what I found

Even if the data is still available, it's helpful to be able to transparently read a notebook rather than Run Cell / Run All (and potentially deal with any environmental changes). In the case of sharing a notebook, it may be tricky to get at the original data if it resides on the other side of a firewall.

I think the reading use case can apply to pedagogical notebooks too. Although you may well want to recreate the plot if possible, you'd like to know what it should look like when you do. (See https://github.com/rhiever/Data-Analysis-and-Machine-Learning-Projects/blob/master/example-data-science-notebook/Example%20Machine%20Learning%20Notebook.ipynb as an example of something you might want to read first, run later.)

Even the jupyter-sigplot demo notebook, as rendered by Github, is an example of this reader conop: if a casual reader could see what SigPlot would look like for the given inputs, they'd be better positioned to determine whether it's a potential fit for their problem. Binder sometimes takes several minutes to load, and one may have a clone of the extension on a network that can't reach your CDN.

The %matplotlib notebook behavior seems to match the "reader" case pretty well: when you open a saved notebook, you can see figures. Then, for other cases, if you re-evaluate the cell, you can interact with the figures (or learn that the data is gone).

Now, Bokeh treats (1) and (2) a little differently, in that its figures tend to be "live" even without the original data available.

SigPlot is more capable for interactive tasks on its own than Matplotlib/Bokeh (which require server-side logic to "do" anything). In the end game, it would be powerful if all that interactivity were available in statically rendered notebooks like the HTML case. I don't have a feel for whether that use case (fully armed and operational SigPlot from a standalone loaded document) is central or fringe. I suspect that full interactivity will end up going along with access to the input data for the playground / exploratory analysis and training cases, and will not be too missed for the simple reading / reporting cases.

The scientific paper may be obsolete, but if the data's not available, a notebook can still be useful, like a whitepaper. If you package the data with your notebook, you get a much richer experience, at the minor cost of re-running the notebook cells.

I wonder also if there are some different classes of interactivity that we might be able to support? Zooming and panning require access to the full input data. CX mode. trace style, abscisa/index, and maybe some scaling, could operate on the subset of the data that's in the viewport (including compressed).

I wonder also if there are some different classes of interactivity that we might be able to support?

@maihde would be able to speak to that best.

Consolidating the imageOutput traitlet issue and _done having odd behavior issue into this issue.

I don't know if it's a red herring, but I noticed that there's an option to "save widget state" in the Web notebook's toolbar. Might be worth digging into what this means and see if it could be leveraged to get even an interactive rendered SigPlot saved with the notebook file. (It would be important to investigate behavior when plotting very large files.)

I've been working on getting a PNG to stay with the notebook, like %matplotlib inline is able to do. I've got that working here To use it, run plot.inlinePlot(). If we like this approach, we can figure out how we want to fully implement it. It makes use of IPython.display.Image(), which I pass it the PNG bytes from the Javascript client.

There are a few other cool display classes available to us here. I wonder if the IPython.display.Javascript() class would give us an interactive plot? Any thoughts?

I think the basic approach is sound: have the client grab a PNG from SigPlot, put it in a traitlet for the server to store, display the PNG on the client. Does the notebook already serialize the PNG in the .ipynb file for us in your prototype? (If not, it may do so if we make it a rich repr instead of an explicit function.)

I think both Image and Javascript can be set as rich reprs. The Image repr is better for printing and reading, and some sharing (all you need is the .ipynb file). So the first serialization to support seems like Image; then users can re-evaluate cells to get a live widget if needed.

Maybe a future enhancement would be to dynamically and automatically replace the image with a live widget, if all the libraries and data are available to a running kernel. Or, like Bokeh, maybe we could save enough code to render a Javascript rich repr.

I still don't really know how widgets and rich reprs interact.

I did not modify the .ipynb file. You need to manually run plot.inlinePlot() after running plot.plot() to get it to save the PNG in the notebook.

I've noticed that there is an issue when you re-run the entire notebook, the plot.inlinePlot() fails. It is looking like there may be some latency issues with the storePlotBytes() function call. It appears to be triggered before the self.pngBytes traitlet is actually populated, resulting in no data for the PNG.

That makes sense. We have the same race in the other direction with overlay_array and friends, which I believe is why @amatma added the inputs, oldArrays/oldHrefs, plot(), and show_ family of functions. Some of the discussion in the low level Widget tutorial is relevant.

I think the basic idea we currently have implemented is sensible: queue up events that need to happen after render (or overlay, in the case of inlinePlot), then execute them once the preconditions are met.

I'm back to looking into this enhancement. plt.inlinePlot() works again with the recent changes to master. It is now compatible with Python3. I'm working out of the issue_26_v3 branch.

The hope is to be able to control the cell that has the plot object when the base64 image finally gets to the python kernel. @mrecachinas had previously recommended seeing if we could change the background of that cell to the png representation so when a notebook is loaded, we have the stored png available.

I'm looking to do that with self.displayHandle.update() in the storePlotImage function, but haven't achieved success yet. It works in other cells that only have text, but not the cell that has a SigPlot object. Ex: h = display("hello", display_id=True) # initially display hello h.update("world") # replaces `hello` with `world`

Could be relevant https://github.com/bokeh/jupyter_bokeh/blob/master/src/renderer.ts

@sterre @maihde OK, sorry for the delay. I have this mostly working now. There seems to be some weird race condition still going on though. See the two example notebooks here:

You'll notice some plots are rendered as PNGs, some are not. Still trying to figure that out.

Just noting here that I finally got a chance to play with this.

I see the race condition you're talking about in the example notebooks.

When I run the code in my some of own notebooks, I get a "Memory" object and no "UUID" element, and the images don't save with the notebook.

Still very promising!

@sterre What versions of ipywidgets, jupyter, notebook, traitlets, and ipython are installed?

This was using Anaconda 2019-10 for all infrastructure, jupyter-sigplot freshly built

LGSInnovations / jupyter-sigplot

Save rendered SigPlot images with the notebook #26