Reduce allocations in runtime/vam/op/summarize

brimdata / super

An analytics database that puts JSON and relational tables on equal footing

https://zed.brimdata.io/

BSD 3-Clause "New" or "Revised" License

1.4k stars 67 forks source link

Reduce allocations in runtime/vam/op/summarize #5474

Closed nwt closed 4 days ago

nwt commented 1 week ago

Add a sync.Pool for the index slices allocated in superTable.update.
Change map values to pointers to limit assignments to string-keyed maps in superTable.update and various countByString methods. (In these assignments, the key escapes to the heap even if already present in the map.)

mccanne commented 1 week ago

This slows down count() by repo.name on the gha dataset by 10%. Is the intention to trade off speed for memory?

nwt commented 1 week ago

Rats. The intention was to improve performance, which it did for count() by event. I'll look into what's happening with count() by repo.name.

nwt commented 4 days ago

Looks like count() by repo.name goes slower because it has high cardinality with low counts, so there aren't enough updates to existing map entries over which to amortize the extra cost of allocating the objects to which those entries point.