MaterializeInc / materialize

The data warehouse for operational workloads.
https://materialize.com
Other
5.68k stars 459 forks source link

Bad arrangement reuse for indexes with different `RETAIN HISTORY` #27192

Open ggevay opened 2 months ago

ggevay commented 2 months ago

What version of Materialize are you using?

593adeec056c2c423d07ff346042cb51dc750daf

What is the issue?

materialize=> 
create table t(x int);

create index i1 on t(x) WITH (RETAIN HISTORY = FOR '1 sec');

create view v1 as
select * from t;

create index i2 on v1(x) WITH (RETAIN HISTORY = FOR '10 sec');

explain physical plan for index i1;
explain physical plan for index i2;

CREATE TABLE
CREATE INDEX
CREATE VIEW
CREATE INDEX
                         Physical Plan                         
---------------------------------------------------------------
 materialize.public.i1:                                       +
   ArrangeBy                                                  +
     raw=true                                                 +
     arrangements[0]={ key=[#0], permutation=id, thinning=() }+
     types=[integer?]                                         +
     Get::PassArrangements materialize.public.t               +
       raw=true                                               +
                                                              +
 Target cluster: quickstart                                   +

(1 row)

                         Physical Plan                         
---------------------------------------------------------------
 materialize.public.i2:                                       +
   Get::PassArrangements materialize.public.v1                +
     raw=false                                                +
     arrangements[0]={ key=[#0], permutation=id, thinning=() }+
     types=[integer?]                                         +
                                                              +
 materialize.public.v1:                                       +
   Get::PassArrangements materialize.public.t                 +
     raw=false                                                +
     arrangements[0]={ key=[#0], permutation=id, thinning=() }+
     types=[integer?]                                         +
                                                              +
 Used Indexes:                                                +
   - materialize.public.i1 (*** full scan ***)                +
                                                              +
 Target cluster: quickstart                                   +

(1 row)

The arrangements of i1 and i2 are the same arrangement, which is not good, because there is a different RETAIN HISTORY setting on them.

There is also a second bug here, which is that MIR EXPLAIN shows that a new arrangement is created, contradicting the physical plan:

materialize=>
explain index i1;
explain index i2;
            Optimized Plan            
--------------------------------------
 materialize.public.i1:              +
   ArrangeBy keys=[[#0]]             +
     ReadStorage materialize.public.t+
                                     +
 Target cluster: quickstart          +

(1 row)

                    Optimized Plan                    
------------------------------------------------------
 materialize.public.i2:                              +
   ArrangeBy keys=[[#0]]                             +
     ReadGlobalFromSameDataflow materialize.public.v1+
                                                     +
 materialize.public.v1:                              +
   ReadIndex on=t i1=[*** full scan ***]             +
                                                     +
 Used Indexes:                                       +
   - materialize.public.i1 (*** full scan ***)       +
                                                     +
 Target cluster: quickstart                          +

(1 row)

There is an ugly workaround where we add a dummy filter in the view:

materialize=> 
create table t(x int);

create index i1 on t(x) WITH (RETAIN HISTORY = FOR '1 sec');

create view v1 as
select * from t where x+1+1 = x+2;

create index i2 on v1(x) WITH (RETAIN HISTORY = FOR '10 sec');

explain physical plan for index i1;
explain physical plan for index i2;

CREATE TABLE
CREATE INDEX
CREATE VIEW
CREATE INDEX
                         Physical Plan                         
---------------------------------------------------------------
 materialize.public.i1:                                       +
   ArrangeBy                                                  +
     raw=true                                                 +
     arrangements[0]={ key=[#0], permutation=id, thinning=() }+
     types=[integer?]                                         +
     Get::PassArrangements materialize.public.t               +
       raw=true                                               +
                                                              +
 Target cluster: quickstart                                   +

(1 row)

                         Physical Plan                         
---------------------------------------------------------------
 materialize.public.i2:                                       +
   ArrangeBy                                                  +
     raw=true                                                 +
     arrangements[0]={ key=[#0], permutation=id, thinning=() }+
     types=[integer]                                          +
     Get::PassArrangements materialize.public.v1              +
       raw=true                                               +
                                                              +
 materialize.public.v1:                                       +
   Get::Arrangement materialize.public.t                      +
     filter=(((#0 + 2) = ((#0 + 1) + 1)))                     +
     key=#0                                                   +
     raw=false                                                +
     arrangements[0]={ key=[#0], permutation=id, thinning=() }+
     types=[integer?]                                         +
                                                              +
 Used Indexes:                                                +
   - materialize.public.i1 (*** full scan ***)                +
                                                              +
 Target cluster: quickstart                                   +

(1 row)

cc @SangJunBak, @chaas, @lfest

aalexandrov commented 2 months ago

There is also a second bug here, which is that MIR EXPLAIN shows that a new arrangement is created, contradicting the physical plan:

I think this is a by-product of ArrangeBy in MIR meaning "ensure that the output is arranged". We recently discussed in person that it might be better to hide ArrangeBy operators if the underlying Get has an AccessStrategy::Index or AccessStrategy::SameDataflow and the ArrangeBy only requests arrangements on keys already provided by the input. I guess this will solve the issue here.

aalexandrov commented 2 months ago

The arrangements of i1 and i2 are the same arrangement, which is not good, because there is a different RETAIN HISTORY setting on them.

Even if we fix the original issue one open question that remains is what to do in the presence of ALTER INDEX RETAIN HISTORY:

materialize=> 
create table t(x int);

create index i1 on t(x) WITH (RETAIN HISTORY = FOR '10 sec');

create view v1 as
select * from t;

create index i2 on v1(x) WITH (RETAIN HISTORY = FOR '5 sec'); -- can use i1 as an input
alter index i1 SET (RETAIN HISTORY = FOR '1 sec'); -- effectively this will always be 5 seconds because of i2

Update: After reading through some more discussion items it seems that changing the retention window of a dependency for an index dataflow should not affect the retention index exported by that dataflow. See #27019.