datafuselabs / databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.64k stars 726 forks source link

Feature: functional dependency optimization #7438

Open leiysky opened 2 years ago

leiysky commented 2 years ago

A functional dependency is a relationship between two attributes in the same database.

If an attribute determines another attribute, we will call it a determinant, and the determined attribute is functionally dependent on the determinant.

We will denote the relationship with $X \rightarrow Y$ if $X$ is a determinant of $Y$, where both $X$ and $Y$ can be a set of attributes.

For example, given a SQL:

select a, a+1 as b from t;

Here the attribute a is a determinant of b because b is derived from a. Thus we can get a functional dependency $(a) \rightarrow (b)$.

Functional dependency is helpful in query optimization, with it we can eliminate outer joins, eliminate useless DISTINCT, eliminate cross joins and etc.

Reference wiki and the thesis for more details.

The functional dependencies can be derived from relational operators, so it's possible to encapsulate them into RelationalProperty.

AngleNet commented 1 year ago

@leiysky HI. I would like to take this issue. Is there any one working on this? I swear I will finish this on my own ~~~ :(

leiysky commented 1 year ago

@leiysky HI. I would like to take this issue. Is there any one working on this? I swear I will finish this on my own ~~~ :(

We have a plan to implement this after #6547 is done.

You are still welcome to take the issue, but I suggest you make a proposal first so we can review it and give some advice since this is an important infrastructure for the optimizer.

AngleNet commented 1 year ago

Great. I will write a proposal on this.

BohuTANG commented 1 year ago

Can this function be optimized? https://github.com/datafuselabs/databend/issues/7468

AngleNet commented 1 year ago

Can this function be optimized?

7468

I think so. I will consider it inside the proposal.

andylokandy commented 1 year ago

7468

This is about CSE optimization. Is it included in functional dependency optimization?

AngleNet commented 1 year ago

7468

This is about CSE optimization. Is it included in functional dependency optimization?

HI. I think we could do this via functional dependency. What do you mean by CSE optimization?

andylokandy commented 1 year ago

CSE is abbr of Common Subexpression Elimination. Let's explain with an example: Given query SELECT max(a+1), (a+1)/2 FROM t1, the CSE optimizer can rewrite it into SELECT max(b), b/2 FROM (SELECT a+1 AS b FROM t1). The actual implementation usually is adding a column for the sub-expression and then hiding it in the top projection, without constructing such sub-query.

I'm not familiar with functional dependency, but from the very limited knowledge I've learned from the wiki, functional dependency is discovered from the relation between data of different columns and can be broken during updates on data.

AngleNet commented 1 year ago

@andylokandy @BohuTANG Sorry. After some investigations, I think I was wrong about doing CSE optimization (the duplicate json_parse elimination) via functional dependency analysis. It's basically another problem to solve. For CSE optimization, I think we could build a common sub-expression collector and rewrite project list or predicate conditions in a select-where-from block via discovered CSEs. The basic idea of the CSE collector is to rewrite each expression in a canonical form which uniquely identify a set of semantically equal expressions, search CSE as visiting it. If the leaf node of an expression is a column during canonical form rewriting, rewrite it to its equivalent column with smallest column id according to an equality inferrer which could be used to infer whether two columns are equal.

leiysky commented 1 year ago

@AngleNet I've read your proposal, but would you mind putting the document into a GitHub issue so we can review it without login Tencent account?

AngleNet commented 1 year ago

@AngleNet I've read your proposal, but would you mind putting the document into a GitHub issue so we can review it without login Tencent account?

Good. I will put it in a issue later.

AngleNet commented 1 year ago

@AngleNet I've read your proposal, but would you mind putting the document into a GitHub issue so we can review it without login Tencent account?

Good. I will put it in a issue later.

The draft doc is #7693