apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.91k stars 1.12k forks source link

Better name display for CAST #10274

Open yyy1000 opened 4 months ago

yyy1000 commented 4 months ago

Describe the bug

The type name for CAST is not correct when using datafusion-cli

To Reproduce

DataFusion CLI v37.1.0

SELECT CAST(2 AS INTEGER); +----------+ | Int64(2) | +----------+ | 2 | +----------+ 1 row(s) fetched. Elapsed 0.027 seconds.

SELECT CAST(2 AS DOUBLE); +----------+ | Int64(2) | +----------+ | 2.0 | +----------+ 1 row(s) fetched. Elapsed 0.005 seconds.

SELECT arrow_typeof(CAST(2 AS DOUBLE)); +------------------------+ | arrow_typeof(Int64(2)) | +------------------------+ | Float64 | +------------------------+ 1 row(s) fetched. Elapsed 0.008 seconds.

Expected behavior

No response

Additional context

I'm finding what's wrong with it. 🤔

jayzhan211 commented 4 months ago

The result seems correct to me.

+----------+ | Int64(2) | +----------+ | 2.0 | +----------+

i64(2) is the name, not the type.

query RT
SELECT CAST(2 AS DOUBLE), arrow_typeof(CAST(2 AS DOUBLE));
----
2 Float64
yyy1000 commented 4 months ago

i64(2) is the name, not the type.

Yes, I realized that. I was thinking whether we should use a better name to display, for example CAST(2 AS INTEGER) in the case but not Int64(2) , like it in duckdb. What do you think? @jayzhan211

SELECT CAST(2 as DOUBLE);
┌───────────────────┐
│ CAST(2 AS DOUBLE) │
│      double       │
├───────────────────┤
│               2.0 │
└───────────────────┘
jayzhan211 commented 4 months ago

Strongly agree with a better name!

alamb commented 4 months ago

Yes, I realized that. I was thinking whether we should use a better name to display, for example CAST(2 AS INTEGER) in the case but not Int64(2) , like it in duckdb. What do you think? @jayzhan211

I agree formatting Int64(2) as 2 would certainly be nicer

I am not sure about removing the CAST

jayzhan211 commented 4 months ago

Yes, I realized that. I was thinking whether we should use a better name to display, for example CAST(2 AS INTEGER) in the case but not Int64(2) , like it in duckdb. What do you think? @jayzhan211

I agree formatting Int64(2) as 2 would certainly be nicer

I am not sure about removing the CAST

I think you are mistaken

Current display is like, which is unclear why i64 type has a result like float, because we do not show the casting

+----------+
| Int64(2) |
+----------+
| 2.0 |
+----------+

Expect display is something like

+----------+
| Cast(Int64(2) as Float)|
+----------+
| 2.0 |
+----------+
alamb commented 4 months ago

Expect display is something like

I agree @jayzhan211 -- sorry for misreading this. I agree that If the user ran

select CAST(2 AS Float)`

seeing this makes sense

+----------+
| Cast(Int64(2) as Float)|
+----------+
| 2.0 |
+----------+

However, if the user wrote something like this

select col = 2

I would expect the result to be something like (even if col was a float column and coercsion applied casts)

+----------+
| col = Int64(2)|
+----------+
| 2.0 |
+----------+
jayzhan211 commented 4 months ago

Expect display is something like

I agree @jayzhan211 -- sorry for misreading this. I agree that If the user ran

select CAST(2 AS Float)`

seeing this makes sense

+----------+
| Cast(Int64(2) as Float)|
+----------+
| 2.0 |
+----------+

However, if the user wrote something like this

select col = 2

I would expect the result to be something like (even if col was a float column and coercsion applied casts)

+----------+
| col = Int64(2)|
+----------+
| 2.0 |
+----------+

It looks good to me because we can infer the type from col.

jayzhan211 commented 2 months ago

Note: I would like to know if it is possible to have one single function name for logical and physical expression. It is also quite related to the issue here.

jayzhan211 commented 2 months ago

@yyy1000 Are you still working on this?

yyy1000 commented 2 months ago

@yyy1000 Are you still working on this?

No, currently having my internship so I don't have buffer on this. 🥲