cmu-db / optd

CMU-DB's Cascades optimizer framework
https://cmu-db.github.io/optd/
MIT License
383 stars 22 forks source link

cost-model: group-by column ref property + cost model design? #202

Open skyzh opened 1 month ago

skyzh commented 1 month ago

currently, aggregation group-by's logical property is like:

select v1 from t1 group by v1;

Agg group=v1 <- schema=[v1], column_ref=[v1]
  Scan t1

but actually, group by could change the distribution of the column, so probably we should set it to derived, or find a way to represent it? if a later join refers to this column, we should treat it differently.

jurplel commented 1 month ago

so probably we should set it to derived

How did you mean? It looks like it's just storing the group by column. I am not sure i'm following where distribution of a column is stored here

skyzh commented 1 month ago

I think it's probably better to store it as Distinct(v1) in column ref logical property so that the cost model can take such information into account