brimdata / super

An analytics database that puts JSON and relational tables on equal footing
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.4k stars 67 forks source link

Reduce allocations in runtime/vam/op/summarize #5474

Closed nwt closed 4 days ago

nwt commented 1 week ago
mccanne commented 1 week ago

This slows down count() by repo.name on the gha dataset by 10%. Is the intention to trade off speed for memory?

nwt commented 1 week ago

Rats. The intention was to improve performance, which it did for count() by event. I'll look into what's happening with count() by repo.name.

nwt commented 4 days ago

Looks like count() by repo.name goes slower because it has high cardinality with low counts, so there aren't enough updates to existing map entries over which to amortize the extra cost of allocating the objects to which those entries point.