apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.48k stars 1.01k forks source link

Some In/Exists Subqueries will generate wrong PhysicalPlan #5265

Open ygf11 opened 1 year ago

ygf11 commented 1 year ago

Describe the bug

Datafusion can't execute non-correlated subquery now, like in/exists. So these queries should return NotImplemented("Physical plan does not support logical expression In/Exists error. But currently the filter will be pushdown to the TableScan.

> explain select * from t1 where exists(select 1 from t2 where t2.t2_id > 0);
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                        |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: t1.t1_id, t1.t1_name, t1.t1_int                                                                                                                 |
|               |   TableScan: t1 projection=[t1_id, t1_name, t1_int], full_filters=[EXISTS (<subquery>)]                                                                     |
|               |     Subquery:                                                                                                                                               |
|               |       Projection: Int64(1)                                                                                                                                  |
|               |         Filter: CAST(t2.t2_id AS Int64) > Int64(0)                                                                                                          |
|               |           TableScan: t2                                                                                                                                     |
| physical_plan | ProjectionExec: expr=[t1_id@0 as t1_id, t1_name@1 as t1_name, t1_int@2 as t1_int]                                                                           |
|               |   CsvExec: files={1 group: [[home/work/tools/datafusion-test-data/join-context/t1.csv]]}, has_header=false, limit=None, projection=[t1_id, t1_name, t1_int] |
|               |                                                                                                                                                             |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+

For the above query, the subquery filter will be lost in the final physical_plan.

To Reproduce As above.

Expected behavior The query should return NotImplemented("Physical plan does not support logical expression In/Exists error.

Additional context Add any other context about the problem here.

mingmwang commented 1 year ago

I remember the filters will be added to wrong side. I can work on this and provide a fix.