flekschas / jupyter-scatter

Interactive 2D scatter plot widget for Jupyter Lab and Notebook. Scales to millions of points!
https://jupyter-scatter.dev
Apache License 2.0
336 stars 14 forks source link

Enable pure JS linking of the view, selection, and hover state #82

Open flekschas opened 1 year ago

flekschas commented 1 year ago

When exporting a notebook to HTML via the following snippet, the resulting HTML file properly renders the scatter plot instance and data but the view, selection, and hover linking do not work as they currently require a Python kernel. However, this is not necessary. By using jslink() we can ensure that the linking works with and without a Python kernel. Therefore, we should switch jslink() over observe().

jupyter nbconvert --execute --to html notebooks/get-started.ipynb
flekschas commented 4 months ago

Having had a quick look, using jslink over observe is tricky as regl-scatterplot strictly relies on the data indices (incrementing integers) while jscatter also supports selections via a Pandas index. To link two scatters by via Pandas indices, the selection relies on querying the Pandas DataFrame by its indices and then getting the data indices.

Say we have the following setup.

import jscatter
import pandas as pd

df_one = pd.DataFrame({
    'id': ['a', 'b', 'c'],
    'x': [1, 2, 3],
    'y': [1, 2, 3],
})
df_two = pd.DataFrame({
    'id': ['b', 'c', 'a'],
    'x': [1, 2, 3],
    'y': [1, 2, 3],
})

config = {
    'x': 'x',
    'y': 'y',
    'color_by': 'id',
    'size': 20,
    'legend': True
}

scatter_one = jscatter.Scatter(data=df_one, **config)
scatter_two = jscatter.Scatter(data=df_two, **config)

jscatter.link([scatter_one, scatter_two], match_by='id')

Simply linking the selection won't work because a in the first scatter has the index 0 but in the second scatter it has index 2. Without access to the dataframe, I'm not sure how to get around this. The only alternative that comes to mind here requires a big refactor where the ID mapping is exposed to the JS kernel to allow client-side only ID matching.

Any thoughts @manzt?

manzt commented 4 months ago

The only alternative that comes to mind here requires a big refactor where the ID mapping is exposed to the JS kernel to allow client-side only ID matching.

Yeah, I'm not sure I can think of a workaround either unfortunately. jslink is super nice if you can get away with it, but usually limited. I wonder if there are some cases where the indexes are the the same (e.g., same data but different columns) and in that case jscatter.link could try for jslink?

Maybe that's too convoluted, or not even possible. I'm just wondering if there is a common case that could be supported with jslink (i.e. a jscatter.jslink), throwing an error if it's not possible.

abast commented 2 weeks ago

Would it be possible to check if the dataframe has a range index and only in this case use jslink?

I would be potentially interested in this feature ... I am working on a website complementing a publication such that readers can interactively explore the dataset. To my understanding, currently multiple synchronized plots require a callback to the server, and my concern is that this would degrade performance for users in Europe if the site is hosted in the US. Is this a reasonable concern that might be solved by using jslink?

flekschas commented 2 weeks ago

@abast that could work. Do you want to draft how this could be implemented?

jslink would for sure be faster than any server as it avoids any server request.