Closed nwt closed 4 days ago
This slows down count() by repo.name
on the gha dataset by 10%. Is the intention to trade off speed for memory?
Rats. The intention was to improve performance, which it did for count() by event
. I'll look into what's happening with count() by repo.name
.
Looks like count() by repo.name
goes slower because it has high cardinality with low counts, so there aren't enough updates to existing map entries over which to amortize the extra cost of allocating the objects to which those entries point.
Add a sync.Pool for the index slices allocated in superTable.update.
Change map values to pointers to limit assignments to string-keyed maps in superTable.update and various countByString methods. (In these assignments, the key escapes to the heap even if already present in the map.)