Feature Request: tobuffer should save IDs to samples, not the other way around

balintlaczko commented 5 months ago

Is your feature request related to a problem? Please describe the problem.

When using tobuffer on a fluid.dataset~, you specify a fluid.labelset~ that will contain the dataset ID (as the label) for each sample in the buffer. Then you can only query with a buffer sample index to get the corresponding dataset ID. I almost never need it this way, but I always need the opposite: to query with a dataset ID and get the corresponding buffer sample index. It is possible to work around this via dataset/labelset dumping, but that gets very prohibitive with large datasets.

Describe the solution you'd like to see.

If tobuffer would create a labelset where the dataset ID becomes the labelset identifier and the buffer sample index the label, it would be much easier down the line to query which dataset ID got mapped to which index (without excessive loops).

Describe alternatives you've considered

An alternative could be that fluid.labelset~ gets an indexof method, which gets the first index (identifier) for a given label. But I don't think this would be as efficient as the above solution.

Another option (if this is not that expensive) to export 2 labelsets where in one we have <buffer-index> : <dataset identifier> (like now) and another where we have <dataset identifier> : <buffer-index> (which would be new). This option could mean that the new version wouldn't break backwards compatibility so much, since the additional labelset would be just the next element in the list of what the dataset~ reports after tobuffer.

Additional context

Normally, other fluid objects (such as kdtree~) will operate on the dataset, and likely give you back dataset IDs. Example: fluid.jit.plotter: to efficiently create a mesh from the 2D dataset, I go refer <datasetname>--> tobuffer --> to matrix with jit.buffer~. Luckily this is super fast even with millions of points because I get to avoid loops.

But if I am highlighting the dataset elements closest to the mouse pointer using a kdtree~, now I get dataset IDs which I need to map to buffer indices (which is the same as matrix indices) to know which points to "highlight". And for this it is unavoidable to dump the samples-to-ids labelset at least once, which can cause seconds of hanging with large datasets. It also adds the burden of now having to update two books with the same data relationship.

tremblap commented 5 months ago

Hello

This is an interesting flip. Maybe we could add that option, the same way we can transpose. Let me think of an interface that would not be a problem and would be backward compatible

balintlaczko commented 5 months ago

The option to flip with an int after the labelset name (like the int after the buffer name) would be great!

flucoma / flucoma-max