cmu-db / peloton

The Self-Driving Database Management System
http://pelotondb.io
Apache License 2.0
2.03k stars 623 forks source link

Duplicate groups generated in the optimizer #1354

Open chenboy opened 6 years ago

chenboy commented 6 years ago

As it was reported by @GustavoAngulo , the assertion in https://github.com/cmu-db/peloton/blob/master/src/optimizer/memo.cpp#L46 is not correct because we may generate duplicate groups with different group ID in the optimizer. This is because now we do not generate all logically equivalent group at once. One possible scenario is we generate ((A join B) join C) join D by one transformation rule and (A join B) join (C join D) by another rule, currently there's no way to detect these two operator trees are equivalent so they'll be assigned different GroupID, then we'll have duplicate groups. Later this assertion will fail due to two groups having the same operator tree generated by transformation rules.

@GustavoAngulo Could you provide the SQL trace that breaks the assertion?

GustavoAngulo commented 6 years ago

You should be able to recreate it with any three way join, or possibly a two way join. I thin running planner_equality_test should cause it. I'm seeing though that the changes from #1245 never made it in, idk why that happened. In which case to trigger the failure you will need to make the fix here

chenboy commented 6 years ago

I found that ORCA may be solving duplicated groups problem by merging them, which I think is the correct way to do it.

https://github.com/greenplum-db/gporca/blob/master/libgpopt/src/search/CMemo.cpp#L605

hhttps://github.com/greenplum-db/gporca/blob/master/libgpopt/src/search/CGroup.cpp#L1010

linmagit commented 6 years ago

Interesting. Do you know how Columbia did it? Or they actually didn't handle it...

On Tue, May 22, 2018 at 9:54 AM, Bowei Chen notifications@github.com wrote:

I found that ORCA may be solving duplicated groups problem by merging them, which I think is the correct way to do it.

https://github.com/greenplum-db/gporca/blob/master/ libgpopt/src/search/CMemo.cpp#L605

https://github.com/greenplum-db/gporca/blob/master/ libgpopt/src/search/CMemo.cpp#L605 https://github.com/greenplum-db/gporca/blob/master/libgpopt/src/search/CGroup.cpp#L1010

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cmu-db/peloton/issues/1354#issuecomment-391063607, or mute the thread https://github.com/notifications/unsubscribe-auth/AFojzae-9s51_ley3FIlm027-iqNcDK7ks5t1EK8gaJpZM4T5Ww3 .

chenboy commented 6 years ago

I didn't see them handle this problem in their codebase, at least they're not merging groups.

@malin1993ml Do you think it's OK to do it in the similar way as ORCA does it?

linmagit commented 6 years ago

I think it’s okay. There’re trade-offs with it, but I don’t have a better approach right now. I think it’s reasonable to proceed with that way. On Tue, May 22, 2018 at 16:28 Bowei Chen notifications@github.com wrote:

I didn't see them handle this problem in their codebase, at least they're not merging groups.

@malin1993ml https://github.com/malin1993ml Do you think it's OK to do it in the similar way as ORCA does it?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/cmu-db/peloton/issues/1354#issuecomment-391172905, or mute the thread https://github.com/notifications/unsubscribe-auth/AFojzbyS2YozIyfl3vHBeGZfXLmC3Uzcks5t1J8wgaJpZM4T5Ww3 .