deephaven / deephaven-core

Deephaven Community Core
Other
254 stars 80 forks source link

Aggregations could reap unused key states #5840

Open cpwright opened 3 months ago

cpwright commented 3 months ago

As a user, I want to build aggregations of recent data without consuming memory for states that have since been removed.

When a state loses it's last row and is removed from the result, we would move the output position to a free list. We need to take care not to remove/add a new state on the same cycle.

Instead of leaving empty output positions, we should use a scheme like our incremental rehash credits to shift the output rows down to the unused slots. This would keep our incremental operations from needing to perform shifts linear in output size on a given cycle; but would necessitate additional data movement.

Making this change would be breaking, because we would no longer preserve initial encounter order for reincarnated states.

rcaudy commented 3 months ago

This is likely a performance loss with no gain for some cases (we need to track previous state for key columns, etc, and the code will be more complex). For long-running aggregations where buckets can go away, this may be a significant win in memory usage.

We have some engine tools that rely on states never moving, including AggregationRowLookup (used for tree lookup and data index lookup).

This will need to be configurable, at a minimum, which likely means adding a builder interface to aggBy.