emacs-jupyter / jupyter

An interface to communicate with Jupyter kernels.
GNU General Public License v3.0
934 stars 92 forks source link

Jupyter and Undo-fu mode make files with images very slow #558

Open JiaweiChenC opened 1 week ago

JiaweiChenC commented 1 week ago

Thank you for the package!

I am using emacs jupyter with undo-fu mode, I found that after running a code block generating images will make the emacs save file really slow. It turns out that is a problem of undo-fu mode and jupyter.

for example after running the following code several times.

#+begin_src jupyter-python :session test
import numpy as np
import matplotlib.pyplot as plt

# Generate random color images
num_images = 4
image_shape = (1000, 1000, 3)  # Image size with three color channels (RGB)

for _ in range(num_images):
    # Generate random image data
    image_data = np.random.rand(*image_shape)
    # Create a figure and axis
    fig, ax = plt.subplots()
    # Display the image
    ax.imshow(image_data)
    # Remove axis ticks and labels
    ax.axis('off')
    # Show the plot
    plt.show()
#+end_src

the undo-fu-session will contain the cache of the images

image

Does that mean the mechanism of jupyter is it will first return all the metadata of the images to the org file and then generating images files based on it? If it is, how can I fix it so the metadata is not returned like that?

This can also be produced with native undo function, where the image data will be contained in variable "buffer-undo-list". Thank you so much for your time!

nnicandro commented 1 week ago

There is a text property that gets added to the Org buffer which contains the Jupyter request object for that source block which means it contains all of the messages that were generated for that request, i.e. the messages that have the image data. This is most likely the source of the problem. That property gets overwritten everytime you run the source block so those property changes get stored in buffer-undo-list. See below

https://github.com/emacs-jupyter/jupyter/blob/f97f4b5d8c83e0b901020f835183dde8a2bf649e/jupyter-org-client.el#L228-L231

Currently the property is there to ensure that a code block doesn't get re-executed when there is already an execution request happening for that code block. See jupyter-org-request-at-point and below

https://github.com/emacs-jupyter/jupyter/blob/f97f4b5d8c83e0b901020f835183dde8a2bf649e/ob-jupyter.el#L535-L536

I didn't know that undo stores those property changes when I wrote the code. We could probably do something like store the jupyter-request-id as the text property and then have a weak hash table (make-hash-table :weakness 'value) with keys as those ids and the values as the request objects that are associated with the Org buffer. This way any live requests will have an entry in the hash table and it can be queried to accomplish the same behavior there is now.