databendlabs / databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.85k stars 750 forks source link

bug: Incorrect sql logic test case in subquery.test #16412

Closed forsaken628 closed 1 month ago

forsaken628 commented 2 months ago

Search before asking

Version

bbd7357c703840262e6ff4234ad6485b32c5ae67

What's Wrong?

This test case in tests/sqllogictests/suites/query/subquery.test is incorrect

Not sorted correctly, the correct result should be

1 10 CA
1 20 CA
1 30 CA
2 40 TX
2 50 TX
2 60 TX
3 NULL NULL
4 70 WY
4 80 WY
5 NULL NULL
6 90 WA

How to Reproduce?

No response

Are you willing to submit PR?

dosubot[bot] commented 2 months ago

Hello @forsaken628! I'm here to help you with any bugs, questions, or contributions you have for the project. Let me know how I can assist you!

To fix the incorrect SQL logic test case in subquery.test to produce the correct sorted result, you need to ensure that the ORDER BY clause is correctly applied to the subquery results. Here is the corrected test case:

query IT
SELECT * FROM c WHERE bill IN (SELECT ship FROM o WHERE o.c_id=c.c_id) ORDER BY c_id, bill
----
1 10 CA
1 20 CA
1 30 CA
2 40 TX
2 50 TX
2 60 TX
3 NULL NULL
4 70 WY
4 80 WY
5 NULL NULL
6 90 WA

This ensures that the results are sorted by c_id and then by bill in ascending order, which matches the expected output [1].

To continue talking to Dosu, mention @dosu.

forsaken628 commented 2 months ago
explain optimized SELECT
    c.c_id,
    o.o_id,
    (
        SELECT max(CASE WHEN c2.bill > o2.ship THEN c2.bill ELSE o2.ship END)
        FROM c AS c2, o AS o2
        WHERE c2.c_id=o2.c_id AND c2.c_id=c.c_id
    )
FROM c
LEFT JOIN o
ON c.c_id=o.c_id
ORDER BY c.c_id, o.o_id;
EvalScalar
├── scalars: [c.c_id (#0) AS (#0), o.o_id (#2) AS (#2), scalar_subquery_11 (#11) AS (#12)]
└── Join(Left)
    ├── build keys: [c_id (#5)]
    ├── probe keys: [c_id (#0)]
    ├── other filters: []
    ├── Sort
    │   ├── sort keys: [default.c.c_id (#0) ASC, default.o.o_id (#2) ASC]
    │   ├── limit: [NONE]
    │   └── Join(Right)
    │       ├── build keys: [c.c_id (#0)]
    │       ├── probe keys: [o.c_id (#3)]
    │       ├── other filters: []
    │       ├── Scan
    │       │   ├── table: default.o
    │       │   ├── filters: []
    │       │   ├── order by: []
    │       │   └── limit: NONE
    │       └── Scan
    │           ├── table: default.c
    │           ├── filters: []
    │           ├── order by: []
    │           └── limit: NONE
    └── EvalScalar
        ├── scalars: [c_id (#5) AS (#5), max(CASE WHEN c2.bill > o2.ship THEN c2.bill ELSE o2.ship END) (#11) AS (#11)]
        └── Aggregate(Final)
            ├── group items: [subquery_5 (#5)]
            ├── aggregate functions: [max(CASE WHEN c2.bill > o2.ship THEN c2.bill ELSE o2.ship END) (#11)]
            └── Aggregate(Partial)
                ├── group items: [subquery_5 (#5)]
                ├── aggregate functions: [max(CASE WHEN c2.bill > o2.ship THEN c2.bill ELSE o2.ship END) (#11)]
                └── EvalScalar
                    ├── scalars: [c_id (#5) AS (#5), if(gt(c2.bill (#6), o2.ship (#9)), c2.bill (#6), o2.ship (#9)) AS (#10), o2.c_id (#8) AS (#13)]
                    └── Join(Inner)
                        ├── build keys: [c_id (#5)]
                        ├── probe keys: [o2.c_id (#8)]
                        ├── other filters: []
                        ├── Scan
                        │   ├── table: default.o
                        │   ├── filters: []
                        │   ├── order by: []
                        │   └── limit: NONE
                        └── Scan
                            ├── table: default.c
                            ├── filters: []
                            ├── order by: []
                            └── limit: NONE

If we add an additional sorting term we can get the correct result

explain optimized SELECT
    c.c_id,
    o.o_id,
    (
        SELECT max(CASE WHEN c2.bill > o2.ship THEN c2.bill ELSE o2.ship END)
        FROM c AS c2, o AS o2
        WHERE c2.c_id=o2.c_id AND c2.c_id=c.c_id
    ) max
FROM c
LEFT JOIN o
ON c.c_id=o.c_id
ORDER BY c.c_id, o.o_id,max;
EvalScalar
├── scalars: [c.c_id (#0) AS (#0), o.o_id (#2) AS (#2)]
└── Sort
    ├── sort keys: [default.c.c_id (#0) ASC, default.o.o_id (#2) ASC, derived.max (#12) ASC]
    ├── limit: [NONE]
    └── EvalScalar
        ├── scalars: [scalar_subquery_11 (#11) AS (#12)]
        └── Join(Left)
            ├── build keys: [c_id (#5)]
            ├── probe keys: [c_id (#0)]
            ├── other filters: []
            ├── Join(Right)
            │   ├── build keys: [c.c_id (#0)]
            │   ├── probe keys: [o.c_id (#3)]
            │   ├── other filters: []
            │   ├── Scan
            │   │   ├── table: default.o
            │   │   ├── filters: []
            │   │   ├── order by: []
            │   │   └── limit: NONE
            │   └── Scan
            │       ├── table: default.c
            │       ├── filters: []
            │       ├── order by: []
            │       └── limit: NONE
            └── EvalScalar
                ├── scalars: [c_id (#5) AS (#5), max(CASE WHEN c2.bill > o2.ship THEN c2.bill ELSE o2.ship END) (#11) AS (#11)]
                └── Aggregate(Final)
                    ├── group items: [subquery_5 (#5)]
                    ├── aggregate functions: [max(CASE WHEN c2.bill > o2.ship THEN c2.bill ELSE o2.ship END) (#11)]
                    └── Aggregate(Partial)
                        ├── group items: [subquery_5 (#5)]
                        ├── aggregate functions: [max(CASE WHEN c2.bill > o2.ship THEN c2.bill ELSE o2.ship END) (#11)]
                        └── EvalScalar
                            ├── scalars: [c_id (#5) AS (#5), if(gt(c2.bill (#6), o2.ship (#9)), c2.bill (#6), o2.ship (#9)) AS (#10), o2.c_id (#8) AS (#13)]
                            └── Join(Inner)
                                ├── build keys: [c_id (#5)]
                                ├── probe keys: [o2.c_id (#8)]
                                ├── other filters: []
                                ├── Scan
                                │   ├── table: default.o
                                │   ├── filters: []
                                │   ├── order by: []
                                │   └── limit: NONE
                                └── Scan
                                    ├── table: default.c
                                    ├── filters: []
                                    ├── order by: []
                                    └── limit: NONE