apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.33k stars 1.2k forks source link

Order by is ignored #13483

Open simonvandel opened 2 days ago

simonvandel commented 2 days ago

Describe the bug

In the following query

EXPLAIN SELECT 
    CASE 
        WHEN name = 'name1' THEN 0.0
        WHEN name = 'name2' THEN 0.5
    END AS a
FROM (
    SELECT 'name1' AS name
    UNION ALL
    SELECT 'name2'
)
ORDER BY a DESC;

the order of a is not matching the ORDER BY a DESC.

The resulting plan looks like this:

+---------------+-------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                              |
+---------------+-------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Sort: a DESC NULLS FIRST                                                                                          |
|               |   Projection: CASE WHEN name = Utf8("name1") THEN Float64(0) WHEN name = Utf8("name2") THEN Float64(0.5) END AS a |
|               |     Union                                                                                                         |
|               |       Projection: Utf8("name1") AS name                                                                           |
|               |         EmptyRelation                                                                                             |
|               |       Projection: Utf8("name2") AS name                                                                           |
|               |         EmptyRelation                                                                                             |
| physical_plan | CoalescePartitionsExec                                                                                            |
|               |   ProjectionExec: expr=[CASE WHEN name@0 = name1 THEN 0 WHEN name@0 = name2 THEN 0.5 END as a]                    |
|               |     UnionExec                                                                                                     |
|               |       ProjectionExec: expr=[name1 as name]                                                                        |
|               |         PlaceholderRowExec                                                                                        |
|               |       ProjectionExec: expr=[name2 as name]                                                                        |
|               |         PlaceholderRowExec                                                                                        |
|               |                                                                                                                   |
+---------------+-------------------------------------------------------------------------------------------------------------------+

As can be seen the Sort from the logical plan gets transformed into a CoalescePartitionsExec in the physical plan. However, the docs for CoalescePartitionsExec https://docs.rs/datafusion/latest/datafusion/physical_plan/coalesce_partitions/struct.CoalescePartitionsExec.html says that "No guarantees are made about the order of the resulting partition."

To Reproduce

In datafusion-cli v 43:

EXPLAIN SELECT 
    CASE 
        WHEN name = 'name1' THEN 0.0
        WHEN name = 'name2' THEN 0.5
    END AS a
FROM (
    SELECT 'name1' AS name
    UNION ALL
    SELECT 'name2'
)
ORDER BY a DESC;

Expected behavior

Resulting order is consistent

Additional context

Running

EXPLAIN VERBOSE SELECT 
    CASE 
        WHEN name = 'name1' THEN 0.0
        WHEN name = 'name2' THEN 0.5
    END AS a
FROM (
    SELECT 'name1' AS name
    UNION ALL
    SELECT 'name2'
)
ORDER BY a DESC;

shows that the EnforceSorting pass seems to remove the SortExec

Omega359 commented 2 days ago

I wonder if this is related to changes to push down sorts, specifically https://github.com/apache/datafusion/pull/11875 ?

alamb commented 1 day ago

PR from @akurmustafa up for reivew: