flekschas / regl-scatterplot

Scalable WebGL-based scatter plot library build with Regl
https://flekschas.github.io/regl-scatterplot/
MIT License
191 stars 24 forks source link

Support for updating point metadata without recreating coordinate index #174

Closed TheStanian closed 4 months ago

TheStanian commented 5 months ago

Right now, the only way of updating the point metadata (valueA and valueB, or whichever alias you prefer) is to do a full draw call including coordinate data. I have a specific application in which only the color indices for the points would change, leading to the rebuilding of the KDBush, which is in this case a major unnecessary slowdown (10m+ points). I would like to see a way to change only the point metadata, so the rebuild doesn't need to occur. This could for example take the form of taking the existing state texture management as used in setPoints and exposing that on its own, using the currently available points. Doing so would not break backwards compatibility with anything whatsoever.

Thank you for your consideration and with kind regards,

Stan

flekschas commented 5 months ago

This is a great idea and certainly a feature I'd be happy to add. Technically I think there are a few ways to achieve this and they might all be useful to have on their own:

  1. We could allow users to directly pass the state texture to publicDraw(). This would be useful on its own when one wants to avoid the main thread having to handle state data at all.
  2. We could allow users to pass the kd tree index to publicDraw() (this needs updating of kdush, which I already did in https://github.com/flekschas/regl-scatterplot/pull/143). This would be useful on its own because for static data one could pre-compute the index.
  3. We should update publicDraw() to accept partial point data (i.e., only some components) and then have that function skip unnecessary updates. Internally we can, for instance, update the state texture using regl's buffer.subdata(). The biggest hurdle is that the draw function relies on row-major data (see toArrayOrientedPoints()). Refactoring this would generally be good.

What about the following non-breaking API changes to scatterplot.draw()?

type Draw = (
  data:
    | number[][]
    | Partial<{ x: number[] | Float32Array, y: number: [] | Float32Array, ... }>
    | Partial<DrawData>,
  options: Partial<{
    showPointConnectionsOnce: boolean,
    transition: boolean,
    transitionDuration: number,
    transitionEasing: string,
    preventFilterReset: boolean,
    hover: number,
    select: number[],
    filter: number[],
    zDataType: 'continuous' | 'categorical',
    wDataType: 'continuous' | 'categorical',
    // NEW
    kdbushIndex: ArrayBuffer
  }>
) => Promise<void>;

interface DrawData {
  points: Partial<Points> | Float32Array,
  connections: number[][],
}

interface Points {
  x: number[] | Float32Array,
  x: number[] | Float32Array,
  z: number[] | Float32Array,
  w: number[] | Float32Array,
}
TheStanian commented 5 months ago

In terms of API changes, your Points interface has a double declaration of x, and could maybe be used in your type definition for Draw? Other than that it looks good to me.

I especially like option 3., as I feel leaning into structs of arrays might be worthwhile for further optimisations down the road (e.g. reducing temporary memory allocation by using typed arrays under the hood) while also avoiding to have to expose the casual user to internals (state texture, kd bush index) too much.

flekschas commented 4 months ago

@TheStanian It took a bit but I finally managed to support skipping the spatial index computation. See https://github.com/flekschas/regl-scatterplot/pull/178 for an example.

While I haven't had time to implement drawing of partial data, you can now precompute the spatial index (in a worker or main thread) and pass that index in with the draw call: scatterplot.draw(points, { spatialIndex }). In the PR I'm showing a test I ran in Firefox where updating the Z/W coordinates for one million points is sped up 6x by skipping the re-indexing.

I'm still planning to add drawing of partial data (e.g., scatterplot.draw({ z: [...] })) in which case the spatial indexing can be skipped automatically. But that's for another PR.