JuliaData / IndexedTables.jl

Flexible tables with ordered indices
https://juliadb.org
MIT License
121 stars 37 forks source link

Simplify grouping iteration mechanism #223

Closed piever closed 5 years ago

piever commented 5 years ago

This uses the new lazygroupmap (bikeshedding on the name wanted) which is a low-level function to abstract the common mechanism in the grouping functions groupby and groupreduce. lazygroupmap(f, keys, perm) returns an iterator that applies f to (key, perm, idxs) where key is the key of the group (as a struct), perm is the permutation and idxs is the range in perm that corresponds to the value key. Both groupby and groupreduce are special cases of this operation. lazygroupmap (implemented in https://github.com/piever/StructArrays.jl/pull/57/files, so tests will fail until that is merged and tagged) also takes care of optimizing comparison (to compute idxs only the refs of pooled data are compared) so it should have good performance.

EDIT: renamed lazygroupmap to maptiedindices (tiedindices is the iterator that iterates the key and range in perm corresponding to that value, so I figured maptiedindices is the natural name for this).

piever commented 5 years ago

This passes tests now that the StructArrays version is tagged. Will merge in a couple of days if there are no objections.

EDIT: ~actually there is still a little bit of performance tuning to do before merging.~ performance fixed (now it's exactly the same iteration as before but the common part has been factored out and moved to StructArrays as it's needed there to compute sortperm efficiently)