Support for updating point metadata without recreating coordinate index

TheStanian commented 7 months ago

[x] Check for duplicate issues.
[x] Describe the feature's goal, motivating use cases, its expected behavior, and impact.
[x] If you are proposing a new API, please explain how the current APIs fail to support your goal.
[x] If you are proposing to change an existing API, please explain if your proposed is backward compatible or incompatible.
[x] If applicable, include screenshots, GIFs, videos, or working examples.

Right now, the only way of updating the point metadata (valueA and valueB, or whichever alias you prefer) is to do a full draw call including coordinate data. I have a specific application in which only the color indices for the points would change, leading to the rebuilding of the KDBush, which is in this case a major unnecessary slowdown (10m+ points). I would like to see a way to change only the point metadata, so the rebuild doesn't need to occur. This could for example take the form of taking the existing state texture management as used in setPoints and exposing that on its own, using the currently available points. Doing so would not break backwards compatibility with anything whatsoever.

Thank you for your consideration and with kind regards,

Stan

flekschas commented 7 months ago

This is a great idea and certainly a feature I'd be happy to add. Technically I think there are a few ways to achieve this and they might all be useful to have on their own:

We could allow users to directly pass the state texture to publicDraw(). This would be useful on its own when one wants to avoid the main thread having to handle state data at all.
We could allow users to pass the kd tree index to publicDraw() (this needs updating of kdush, which I already did in https://github.com/flekschas/regl-scatterplot/pull/143). This would be useful on its own because for static data one could pre-compute the index.
We should update publicDraw() to accept partial point data (i.e., only some components) and then have that function skip unnecessary updates. Internally we can, for instance, update the state texture using regl's buffer.subdata(). The biggest hurdle is that the draw function relies on row-major data (see toArrayOrientedPoints()). Refactoring this would generally be good.

What about the following non-breaking API changes to scatterplot.draw()?

type Draw = (
  data:
    | number[][]
    | Partial<{ x: number[] | Float32Array, y: number: [] | Float32Array, ... }>
    | Partial<DrawData>,
  options: Partial<{
    showPointConnectionsOnce: boolean,
    transition: boolean,
    transitionDuration: number,
    transitionEasing: string,
    preventFilterReset: boolean,
    hover: number,
    select: number[],
    filter: number[],
    zDataType: 'continuous' | 'categorical',
    wDataType: 'continuous' | 'categorical',
    // NEW
    kdbushIndex: ArrayBuffer
  }>
) => Promise<void>;

interface DrawData {
  points: Partial<Points> | Float32Array,
  connections: number[][],
}

interface Points {
  x: number[] | Float32Array,
  x: number[] | Float32Array,
  z: number[] | Float32Array,
  w: number[] | Float32Array,
}

TheStanian commented 7 months ago

In terms of API changes, your Points interface has a double declaration of x, and could maybe be used in your type definition for Draw? Other than that it looks good to me.

I especially like option 3., as I feel leaning into structs of arrays might be worthwhile for further optimisations down the road (e.g. reducing temporary memory allocation by using typed arrays under the hood) while also avoiding to have to expose the casual user to internals (state texture, kd bush index) too much.

flekschas commented 5 months ago

@TheStanian It took a bit but I finally managed to support skipping the spatial index computation. See https://github.com/flekschas/regl-scatterplot/pull/178 for an example.

While I haven't had time to implement drawing of partial data, you can now precompute the spatial index (in a worker or main thread) and pass that index in with the draw call: scatterplot.draw(points, { spatialIndex }). In the PR I'm showing a test I ran in Firefox where updating the Z/W coordinates for one million points is sped up 6x by skipping the re-indexing.

I'm still planning to add drawing of partial data (e.g., scatterplot.draw({ z: [...] })) in which case the spatial indexing can be skipped automatically. But that's for another PR.

flekschas / regl-scatterplot

Support for updating point metadata without recreating coordinate index #174