apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.18k stars 1.17k forks source link

Failed to project when items contain expr and Cast(expr) #4478

Open ygf11 opened 1 year ago

ygf11 commented 1 year ago

Describe the bug When I want to project column and the result of its type coercion, datafusion will throw error:

❯ select a, cast(a as Int) from test0;
Plan("Projections require unique expression names but the expression \"test0.a\" at position 0 and \"CAST(test0.a AS Int32)\" at position 3 have the same name. Consider aliasing (\"AS\") one of them.")

To Reproduce Run projection with expression and its cast-wrapped expression.

Expected behavior

Additional context

HaoYang670 commented 1 year ago

This is because the cast expression is ignored in the column name, which is an expected behaviour, I guess.

ygf11 commented 1 year ago

Yes, cast expression is ignored in the column name.

I think the other reason is our projection does not support duplicated items which seems PostgreSQL and spark both support.

spark

spark-sql> select a,a,a from test0;
a   a   a
1   1   1
Time taken: 0.134 seconds, Fetched 1 row(s)

PostgreSQL

> select c0,c0,c0,c0 from test0 limit 3;
  c0   |  c0   |  c0   |  c0
-------+-------+-------+-------
       |       |       |
 19470 | 19470 | 19470 | 19470
 22111 | 22111 | 22111 | 22111
(3 rows)
(1 row)

datafusion

❯ select c0, c0, c0 from test0;
SchemaError(FieldNotFound { field: Column { relation: None, name: "c0" }, valid_fields: Some([Column { relation: Some("test0"), name: "a" }, Column { relation: Some("test0"), name: "b" }, Column { relation: Some("test0"), name: "c" }]) })
mingmwang commented 1 year ago

This is because the cast expression is ignored in the column name, which is an expected behaviour, I guess.

Why it is the expected behavior ?

mingmwang commented 1 year ago

I think SparkSQL does not have a consistent behavior. See example below:

SparkSQL 3.x Thrift Server JDBC

CREATE TEMPORARY TABLE t1 (t1_id INT,t1_name String);
select t1_id, cast(t1_id as decimal(10, 0)), t1_id + cast(t1_id as decimal(10, 0)) from t1;

Result:

+-----+-----+-------------------------------------------------------------------+ | t1_id | t1_id | (CAST(t1_id AS DECIMAL(10,0)) + CAST(t1_id AS DECIMAL(10,0))) | +-----+-----+-------------------------------------------------------------------+

+-----+-----+-------------------------------------------------------------------+