brimdata / super

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.38k stars 64 forks source link

fix bug in streaming summarize #4529

Open mccanne opened 1 year ago

mccanne commented 1 year ago

When streaming results out of a summarize group-by, the current logic looks only at the first group-by key assuming that is the sort order. This is arbitrary since the sort order can correspond to any group-by key. Fix the code to figure out which group-by key matches the sort-order and do the comparison accordingly.

mccanne commented 1 year ago

We should also clarify in the docs for summarize that when a group-by key is present that conforms to the input sort order, then the output order is preserved unlike other summarize operations. Or, we might want to make this a flag to summarize since the overhead of maintaining order might not always be desired?