MaterializeInc / materialize

The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data.
https://materialize.com
Other
5.72k stars 466 forks source link

storage/sources/loadgen: inconsistent monotonicity information between sources and subsources #20436

Open vmarcos opened 1 year ago

vmarcos commented 1 year ago

What version of Materialize are you using?

v0.61.0-dev (9cefb13f7)

What is the issue?

The monotonicity information associated to a source should propagate to its subsources, but it presently does not: https://github.com/MaterializeInc/materialize/blob/2fb423ad0dbfd3affa9359088d97170c673e6d5a/src/adapter/src/coord/dataflows.rs#L528-L537

Slack refs: 1, 2.

The issue is problematic with monotonic sources, for which we then fail to produce incremental view maintenance plans taking advantage of monotonicity. To illustrate this behavior, recall that the auction load generator is monotonic. Now, consider the following statements:

CREATE SOURCE auction FROM LOAD GENERATOR AUCTION FOR ALL TABLES WITH (SIZE '1');

CREATE VIEW max_bid_by_auction AS
SELECT DISTINCT ON (id) id, amount, bid_id
FROM auctions,
     LATERAL (SELECT id AS bid_id, amount
              FROM bids
              WHERE auction_id = auctions.id AND
                    bid_time <= end_time
              ORDER BY amount DESC
              LIMIT 1)
ORDER BY id;

EXPLAIN PHYSICAL PLAN FOR SELECT * FROM max_bid_by_auction;
EXPLAIN PHYSICAL PLAN FOR VIEW max_bid_by_auction;

The outcome of executing the above statements is:

materialize=> CREATE SOURCE auction FROM LOAD GENERATOR AUCTION FOR ALL TABLES WITH (SIZE '1');

CREATE VIEW max_bid_by_auction AS
SELECT DISTINCT ON (id) id, amount, bid_id
FROM auctions,
     LATERAL (SELECT id AS bid_id, amount
              FROM bids
              WHERE auction_id = auctions.id AND
                    bid_time <= end_time
              ORDER BY amount DESC
              LIMIT 1)
ORDER BY id;

EXPLAIN PHYSICAL PLAN FOR SELECT * FROM max_bid_by_auction;
EXPLAIN PHYSICAL PLAN FOR VIEW max_bid_by_auction;
CREATE SOURCE
Time: 595.540 ms
CREATE VIEW
Time: 132.544 ms
                                         Physical Plan                                         
-----------------------------------------------------------------------------------------------
 Explained Query:                                                                             +
   TopK::MonotonicTop1 group_by=[#0] must_consolidate                                         +
     Mfp                                                                                      +
       project=(#0, #3, #2)                                                                   +
       TopK::MonotonicTop1 group_by=[#0, #1] order_by=[#3 desc nulls_first] must_consolidate  +
         Join::Linear                                                                         +
           linear_stage[0]                                                                    +
             closure                                                                          +
               project=(#0..=#3)                                                              +
               filter=((#4 <= #1))                                                            +
             lookup={ relation=1, key=[#1] }                                                  +
             stream={ key=[#0], thinning=(#1) }                                               +
           source={ relation=0, key=[#0] }                                                    +
           ArrangeBy                                                                          +
             raw=true                                                                         +
             arrangements[0]={ key=[#0], permutation=id, thinning=(#1) }                      +
             Get::Collection materialize.public.auctions                                      +
               raw=true                                                                       +
           ArrangeBy                                                                          +
             raw=true                                                                         +
             arrangements[0]={ key=[#1], permutation={#0: #1, #1: #0}, thinning=(#0, #2, #3) }+
             Get::Collection materialize.public.bids                                          +
               raw=true                                                                       +
                                                                                              +
 Source materialize.public.auctions                                                           +
   project=(#0, #3)                                                                           +
 Source materialize.public.bids                                                               +
   project=(#0, #2..=#4)                                                                      +

(1 row)

Time: 14.290 ms
                                         Physical Plan                                         
-----------------------------------------------------------------------------------------------
 materialize.public.max_bid_by_auction:                                                       +
   TopK::Basic group_by=[#0] limit=1                                                          +
     Mfp                                                                                      +
       project=(#0, #3, #2)                                                                   +
       TopK::Basic group_by=[#0, #1] order_by=[#3 desc nulls_first] limit=1                   +
         Join::Linear                                                                         +
           linear_stage[0]                                                                    +
             closure                                                                          +
               project=(#0..=#3)                                                              +
               filter=((#4 <= #1))                                                            +
             lookup={ relation=1, key=[#1] }                                                  +
             stream={ key=[#0], thinning=(#1) }                                               +
           source={ relation=0, key=[#0] }                                                    +
           ArrangeBy                                                                          +
             raw=true                                                                         +
             arrangements[0]={ key=[#0], permutation=id, thinning=(#1) }                      +
             Get::Collection materialize.public.auctions                                      +
               raw=true                                                                       +
           ArrangeBy                                                                          +
             raw=true                                                                         +
             arrangements[0]={ key=[#1], permutation={#0: #1, #1: #0}, thinning=(#0, #2, #3) }+
             Get::Collection materialize.public.bids                                          +
               raw=true                                                                       +
                                                                                              +
 Source materialize.public.auctions                                                           +
   project=(#0, #3)                                                                           +
 Source materialize.public.bids                                                               +
   project=(#0, #2..=#4)                                                                      +

(1 row)

Time: 11.624 ms

Above, we see that the EXPLAIN PHYSICAL PLAN FOR VIEW variant fails to recognize the monotonicity of the subsources and produce a monotonic incremental view maintenance plan. The plan for a one-shot SELECT can always exploit monotonicity, by contrast, but must_consolidate could in the case of a source that is already monotonic be turned off (see #18732).

dseisun-materialize commented 1 year ago

@petrosagg help me validate the impact - would this impact kafka APPEND ONLY sources? I would assume it doesn't given the only subsource is the progress subsource, but wanted to validate. Also for postgres given that none of those should be monotonic that shouldn't be an issue either correct?

petrosagg commented 1 year ago

@dseisun-materialize Your understanding is correct, neither kafka nor postgres are impacted because the former doesn't have subsources and the latter is not monotonic. We do have loadgen sources however that produce append-only subsources and those are impacted

dseisun-materialize commented 1 year ago

@vmarcos for my edification what's the impact of not being able to detect if something is monotonic? I'm guessing we can't perform certain optimizations?

vmarcos commented 1 year ago

@dseisun-materialize Yes, missing out on optimizations is the main impact. Especially for patterns like min/max/top-k, we'd need to build by default large stacks of arrangements (potentially allocating a ton more memory and/or spending more time) for incremental maintenance. After we turn on enable_monotonic_oneshot_selects in prod, the impact on one-shot SELECTs of not knowing monotonicity will be smaller, but still not zero.

So I guess that if there are people self-onboarding out there and trying out the auction load generator (because that's the one we point them to for getting started), they could be seeing more expensive plans than they'd need to.