apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.5k stars 1.29k forks source link

[multistage] MV column support in Multi Stage #10658

Open xiangfu0 opened 1 year ago

xiangfu0 commented 1 year ago

Support the implementation of array based aggregation/(group by/distinct) support for multi-stage query engine.

Current V1 distinct or group by on multi-value column will break each array into individual values, then perform the operation.

siddharthteotia commented 1 year ago

Discussed with @xiangfu0 .

I'd like @vvivekiyer to pick this up since he has been contributing some bug fixes and enhancements to multi stage. He can get going with this right away dedicatedly and work with us here on design / discussion etc as needed.

vvivekiyer commented 1 year ago

Thanks @siddharthteotia. Sure, I can pick it up.

vvivekiyer commented 1 year ago

I have a draft implementation for SUM_MV here. I can improve on it and add support for all other aggregations in a similar fashion. To support GROUP BY, I am thinking of writing an RelOptRule to modify the return type of the project in the leaf stage to be a primitive type instead of array.

But before I go ahead with making the entire changes, let’s discuss how we’d like to see the evolution of MV columns. I see that there are two possible options (please let me know if there are more):

  1. Provide backward compatibility support for MV columns in the V2 Engine (for all existing operations). Create a new datatype for ARRAY. This will allow us to provide standard SQL operations (like postgres) on ARRAY columns. We can optionally decide to phase out MV column functionality in favor of standard ARRAYs (because MV functionality can be achieved with a combination of ARRAYs/UNNEST and subqueries). As the end state, Pinot will have support for two datatypes:

    • MV (which behaves the same way as today. Additionally working for Joins/Subqueries and other complex SQL).
    • ARRAY (which behaves similar to Postgres).
  2. Do not introduce an ARRAY datatype. But equate/evolve our support of MV columns to achieve all the support that ARRAYs have.

I’m inclined with doing (1) where we onboard all future customers to use ARRAYs (when the functionality is complete) and slowly phase out MV column support.

Also, currently, we unnest MV columns when applying aggregation functions and group by on MV columns. Do we want to retain the behavior as such when executing all sorts of complex SQL (Joins, Subqueries, etc)?

cc: @xiangfu0 @walterddr @siddharthteotia @somandal

somandal commented 1 year ago

Thanks @vvivekiyer I'll take a look at this in more detail soon. Just wanted to bring up what we discussed offline:

Do not introduce an ARRAY datatype. But equate/evolve our support of MV columns to achieve all the support that ARRAYs have.

I do have some concerns with this. MV today doesn't have a very clear guideline on whether it's really like an ARRAY or a SET or something else. For example (as we discussed offline), today when the ForwardIndexHandler disables the forward index for an MV column and wants to reconstruct it later on, the following guarantees cannot be met:

Fixing the reordering issue is not really possible as we'd need to store some information in the segment about the order of data and this will take away the benefits of disabling the forward index in the first place. As for duplicate handling, we can potentially fix this by storing the frequency when the forward index is disabled, but this is still a non-trivial change and will need to be fixed.

Though we don't reorder MV rows (as far as I can tell) anywhere else in the code, these semantics aren't ingrained into Pinot. We may need to make significant changes (at least to our handling of the forward index) to make this work, or make the call that if someone disables forward index and re-enables it, they may lose all form of array semantics.

vvivekiyer commented 1 year ago

@xiangfu0 and @walterddr I have created a doc outlining the future plan for MV columns on Multistage. Please take a look and provide feedback. I will update the document with the implementation details once we get alignment on the overall direction for MV support.

cc: @somandal @siddharthteotia @jasperjiaguo

siddharthteotia commented 1 year ago

Hey @xiangfu0 - gentle ping to take a look. We can also meet to iterate faster.

siddharthteotia commented 1 year ago

hey @xiangfu0 / @walterddr - can we please meet soon to discuss the path ahead ? Want to unblock and make progress on this.

kishoreg commented 1 year ago

this is probably not needed if we implement the idea described here - https://github.com/apache/pinot/issues/10745 right?

siddharthteotia commented 1 year ago

Yes let's prioritize fixing the aggregation function work. @vvivekiyer and @somandal will pick it up

But both are needed imo

xiangfu0 commented 1 year ago

MV column is modeled as an array in multi-stage engine, in order to follow v1 filter/groupby behavior, users need to use the function arrayToMV to bridge the data type conversion in leaf stage.

https://github.com/apache/pinot/pull/11117