flekschas / jupyter-scatter

Interactive 2D scatter plot widget for Jupyter Lab and Notebook. Scales to millions of points!
https://jupyter-scatter.dev
Apache License 2.0
377 stars 17 forks source link

scatter.data(): points disappear because pointsize depends on zoom #159

Closed abast closed 2 weeks ago

abast commented 3 weeks ago

Create two datasets with very different scales:

import jscatter
import numpy as np
import pandas as pd

data1 = pd.DataFrame({
    "x": np.random.rand(1000),
    "y": np.random.rand(1000),
})
data2 = pd.DataFrame({
    "x": 200*np.random.rand(1000),
    "y": 200*np.random.rand(1000),
})
s0 = jscatter.Scatter(data=data1, x="x", y="y", width=500, height=500)
s0.show()

Now change the dataset:

s0.data(data=data2, animate = True, reset_view = True)

This causes a 'zoom out' to fit the range of data2. As it zooms out, the points become smaller and smaller, on my screen they become invisible. I played around with the keyword arguments, but it doesn't seem to make a difference in terms of the final point size. What I would have expected is that the point size (in terms of pixels) remains fixed.

flekschas commented 3 weeks ago

This causes a 'zoom out' to fit the range of data2.

Correct. Since you didn't reset the scales, the new data is plotted in the existing space.

What I would have expected is that the point size (in terms of pixels) remains fixed.

The point size is a function of the view. If you increase the space by a factor of 200 in both dimensions and zoom out to view the entire space, the point size must be smaller. Similar to when you zoom out by a factor of 200 without changing the data, the points get a lot smaller.

A simple solution is to reset the scales via reset_scales=True. E.g.:

s0.data(data=data2, reset_scales=True)

https://github.com/user-attachments/assets/453aefb9-5844-40d0-9d81-acfa6ed75378

There's a feature request for regl-scatterplot to allow fixed point sizes: https://github.com/flekschas/regl-scatterplot/issues/169 However, that feature is not yet implemented and, hence, Jupyter Scatter only offers zoom-dependent point sizes at the moment.

abast commented 3 weeks ago

Yes, my college Jody Clements (https://github.com/neomorphic) has tested it and it works! Thanks!

Edit: this comment was meant to go here: https://github.com/flekschas/jupyter-scatter/issues/158

flekschas commented 2 weeks ago

Thanks for confirming! I'm closing the ticket then.

abast commented 1 week ago

I would like to come back to this - what is nice about reset_view = True is the animation of the axis, showing the change in scale. In contrast, reset_scales just has the axis snap to its new values during the animation, so it becomes impossible to know which points changed their value a lot, and which did not. It seems that all functionality is essentially there, but I struggle to put it together in such a way that updating the data leads to an animation of both points and axis that is consistent with each other.

To make this more tracktable, here is a mock dataset:

# 1000 random points within a square
data1 = pd.DataFrame({"x": np.random.rand(1000), "y": np.random.rand(1000)})
# stretch the left half of the square by a large factor (factor of 2)
data2 = data1.copy()
data2.loc[data2.x < 0.5, 'x'] = data2.loc[data2.x < 0.5,'x']*2 - 0.5 
# shrink the right half of the square by a small factor (5%)
data2.loc[data2.x > 0.5, 'x'] = (data2.loc[data2.x > 0.5,'x']-0.5)*0.95+0.5 

How can I visualize the transition between data1 and data2 in a way that makes it visible that the left half has changed a lot and the right half stayed mostly constant? The 'ideal' animation might look like: (1) zoom to the min-max ranges of data2 and data1 combined. (2) update the dataset to data2 and animate the transition. (3) zoom to the ranges of data2. (4) Change the reset_view button functionality to set the view to (3) from now on. Could we discuss ways to do so? Maybe most of the functionality is already present?