Closed jayzhan211 closed 2 months ago
I think this might be caused by join reorder, execute the following two queries in DataFusion
DataFusion CLI v36.0.0
❯ create table a1 as values
([1,2,3], 1),
([4,5,6], 2);
❯ create table a2 as values
(1),
(2);
❯ create table b as values (10), (20), (30);
❯ select * from a1,b;
+-----------+---------+---------+
| column1 | column2 | column1 |
+-----------+---------+---------+
| [1, 2, 3] | 1 | 10 |
| [4, 5, 6] | 2 | 10 |
| [1, 2, 3] | 1 | 20 |
| [4, 5, 6] | 2 | 20 |
| [1, 2, 3] | 1 | 30 |
| [4, 5, 6] | 2 | 30 |
+-----------+---------+---------+
6 rows in set. Query took 0.006 seconds.
❯ select * from a2,b;
+---------+---------+
| column1 | column1 |
+---------+---------+
| 1 | 10 |
| 1 | 20 |
| 1 | 30 |
| 2 | 10 |
| 2 | 20 |
| 2 | 30 |
+---------+---------+
6 rows in set. Query took 0.006 seconds.
The result of query select * from a2,b
is the same to DuckDB, but query select * from a1,b
is different.
This is because query select * from a1,b
has been reordered, and we can confirm it with explain verbose
.
Differentiate the left and right tables by the number of rows in statistics.
This reordering behavior is triggered by the bytes size of tables. https://github.com/apache/arrow-datafusion/blob/ca37ce37933f7874d404364cb8c23438baceb46d/datafusion/core/src/physical_optimizer/join_selection.rs#L73-L77
Because we need to load the left side into memory, this behavior should be reasonable.
Additionally, I've also noticed that there is a TODO here, which might be helpful.
// TODO: In PrestoSQL, the optimizer flips join sides only if one side is much smaller than the other by more than SIZE_DIFFERENCE_THRESHOLD times, by default is is 8 times.
take
I took a look at Postgres cross join behavior, it looks like the order of the result is not deterministic either (meaning it feel free to decide which table is small and should be materialized inside memory, and the order of the output will be based on this decision)
Example: create table_a with 3 rows table_b and table_b_small with the same schema, table_b with 10k rows and table_b_small with 2 rows
CREATE TABLE table_a (
id SERIAL PRIMARY KEY,
name VARCHAR(50)
);
CREATE TABLE table_b (
id SERIAL PRIMARY KEY,
description VARCHAR(50)
);
CREATE TABLE table_b_small (
id SERIAL PRIMARY KEY,
description VARCHAR(50)
);
INSERT INTO table_a (name) VALUES
('Alice'),
('Bob'),
('Charlie');
INSERT INTO table_b_small (description) VALUES
('Description 1'),
('Description 2');
DO $$
BEGIN
FOR i IN 1..10000 LOOP
INSERT INTO table_b (description)
VALUES ('Description ' || i);
END LOOP;
END $$;
postgres=# select a.name, b.description from table_a as a, table_b as b;
name | description
---------+-------------------
Alice | Description 1
Bob | Description 1
Charlie | Description 1
Alice | Description 2
Bob | Description 2
Charlie | Description 2
postgres=# select a.name, b.description from table_a as a, table_b_small as b;
name | description
---------+---------------
Alice | Description 1
Alice | Description 2
Bob | Description 1
Bob | Description 2
Charlie | Description 1
Charlie | Description 2
Query plans
postgres=# explain select a.name, b.description from table_a as a, table_b as b;
QUERY PLAN
--------------------------------------------------------------------------
Nested Loop (cost=0.00..67694.27 rows=5401080 width=134)
-> Seq Scan on table_b b (cost=0.00..164.02 rows=10002 width=16)
-> Materialize (cost=0.00..18.10 rows=540 width=118)
-> Seq Scan on table_a a (cost=0.00..15.40 rows=540 width=118)
(4 rows)
postgres=# explain select a.name, b.description from table_a as a, table_b_small as b;
QUERY PLAN
--------------------------------------------------------------------------------
Nested Loop (cost=0.00..3677.15 rows=291600 width=236)
-> Seq Scan on table_a a (cost=0.00..15.40 rows=540 width=118)
-> Materialize (cost=0.00..18.10 rows=540 width=118)
-> Seq Scan on table_b_small b (cost=0.00..15.40 rows=540 width=118)
(4 rows)
I took a look at Postgres cross join behavior, it looks like the order of the result is not deterministic either (meaning it feel free to decide which table is small and should be materialized inside memory, and the order of the output will be based on this decision)
@duongcongtoai Yes. Order itself is not an issue. A relation is an unordered set. "Two relational algebra expressions are said to be equivalent if on every legal database instance the two expressions generate the same set of tuples."
and as stated in the PostgreSQL documentation:
If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on.
Let's close this issue for now; we can reopen it if there are any other problems.
The syntax is like
select * from unnest(), unnest(), table;
Similar for Duckdb, a is [1,2,3] and [10, 20]
In datafusion, we have the similar result but the ordering is different. I think we should understand the reason of the different ordering in cross join.
We can see that the ordering is different only for UnnestExec, it implies that crossjoin is probably doing the different things for UnnestExec and MemoryExec
_Originally posted by @jayzhan211 in https://github.com/apache/arrow-datafusion/pull/9355#discussion_r1505255606_