cmu-db / noisepage

Self-Driving Database Management System from Carnegie Mellon University
https://noise.page
MIT License
1.74k stars 502 forks source link

Improper usage of ConvertExprCVNodes OR incorrect documentation #1312

Open jkosh44 opened 3 years ago

jkosh44 commented 3 years ago

Bug Report

Summary

The documentation for ConvertExprCVNodes can be found below. It specifically says "This function should always be called before calling EvaluateExpression" https://github.com/cmu-db/noisepage/blob/0cab922d427d28b31f51477b7aeef9d5a24a9dbf/src/include/parser/expression_util.h#L196-L220

However this rule is not actually followed in practice, as the following examples show https://github.com/cmu-db/noisepage/blob/0cab922d427d28b31f51477b7aeef9d5a24a9dbf/src/optimizer/plan_generator.cpp#L500-L502 https://github.com/cmu-db/noisepage/blob/0cab922d427d28b31f51477b7aeef9d5a24a9dbf/src/optimizer/plan_generator.cpp#L560-L562 https://github.com/cmu-db/noisepage/blob/0cab922d427d28b31f51477b7aeef9d5a24a9dbf/src/optimizer/plan_generator.cpp#L592-L595 https://github.com/cmu-db/noisepage/blob/0cab922d427d28b31f51477b7aeef9d5a24a9dbf/src/optimizer/plan_generator.cpp#L624-L627 https://github.com/cmu-db/noisepage/blob/0cab922d427d28b31f51477b7aeef9d5a24a9dbf/src/optimizer/plan_generator.cpp#L669-L671 https://github.com/cmu-db/noisepage/blob/0cab922d427d28b31f51477b7aeef9d5a24a9dbf/src/optimizer/plan_generator.cpp#L715-L717 https://github.com/cmu-db/noisepage/blob/0cab922d427d28b31f51477b7aeef9d5a24a9dbf/src/optimizer/plan_generator.cpp#L756-L758

With the current usage there's the possibility that some work done by EvaluateExpression is thrown out by ConvertExprCVNodes. For example I came across this during #1310. When evaluating the having clause EvaluateExpression will evaluate all children expressions, including evaluating the column value children of the aggregates. Then when ConvertExprCVNodes is called, the aggregates are directly converted into tuple values and their column value children are thrown out.

I'm not that familiar with this part of the DB so I don't know if it's an issue with the usage or an issue with the documentation.

Environment

N/A

Steps to Reproduce

N/A

Expected Behavior

N/A

Actual Behavior

N/A

jkosh44 commented 3 years ago

Just as a test I switched all the usages of ConvertExprCVNodes to be before EvaluateExpression in this branch: https://github.com/jkosh44/noisepage/tree/fix-ConvertExprCVNodes-usage and nothing seems to be broken. Though I'd like someone with more experience in the optimizer to weigh in before opening a PR. (https://github.com/jkosh44/noisepage/commit/3428d87b1433b1ada8e349b39f6d984ef419bd7a)

xinzhu-cai commented 3 years ago

The statement "This function should always be called before calling EvaluateExpression" in the documentation for ConvertExprCVNodes is correct, especially when the expression is from a SELECT clause. This ensures the correctness of the return value type in EvaluateExpression. However, since return types ofjoin predicates and having predicate are always boolean, calling EvaluateExpression before ConvertExprCVNodes in these cases actually generates the same results. I think we can switch all the usages of ConvertExprCVNodes to be before EvaluateExpression to be consistent.

jkosh44 commented 3 years ago

@xinzhu-cai So just to clarify, you're saying that ConvertExprCVNodes should normally come before EvaluateExpression, but when the return type is boolean then the order doesn't matter?