apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.99k stars 1.14k forks source link

Ability to inspect type of an expression #12272

Open findepi opened 4 weeks ago

findepi commented 4 weeks ago

Is your feature request related to a problem or challenge?

I as a user want to be able to inspect type of expressions. Example usage is to inspect different sides of the comparison (column vs expression) to make sure there is no implicit cast preventing or limiting predicate pushdown. The type inspection should also help me write a CAST.

Describe the solution you'd like

A function similar to arrow_typeof which reveals the SQL type of an expression.

Describe alternatives you've considered

Using arrow_typeof function. Teturn result of that function cannot be used in a CAST:

DataFusion CLI v41.0.0
> SELECT arrow_typeof('a');
+-------------------------+
| arrow_typeof(Utf8("a")) |
+-------------------------+
| Utf8                    |
+-------------------------+
1 row(s) fetched.
Elapsed 0.015 seconds.

> SELECT CAST('a' AS Utf8);
This feature is not implemented: Unsupported SQL type Custom(ObjectName([Ident { value: "Utf8", quote_style: None }]), [])

Additional context

None

jayzhan211 commented 3 weeks ago

select arrow_cast('a', 'Utf8')

Is this what you want?

findepi commented 3 weeks ago

Good question. Maybe. Is arrow_cast(val, arrow_typeof(val2)) guaranteed to be equivalent to CAST(val, <SQL type of val2>)? Also, from SQL user perspective, arrow is an implementation detail, so it could be preferred to be able to write the desired cast in familiar terms (ie using CAST rather than arrow_cast).

jayzhan211 commented 3 weeks ago

Is arrow_cast(val, arrow_typeof(val2)) guaranteed to be equivalent to CAST(val, )?

Yes

arrow is an implementation detail

Maybe datafusion_cast is the better name? Since we currently have arrow's type, this is not only implementation detail but also the type system in datafusion

If we are to implement a new casting feature, I would also prefer syntax like cast(a as b). But since it is already widely used, I think there is no good reason to change the syntax.

Another choice is to support cast(a as b) as additions, but I personally don't like to have more than one equivalent things exist, easy to cause confusion or mess. And, this syntax is similar to postgres pg_typeof which is also a widely used syntax. I suggest we keep the syntax as it is

findepi commented 3 weeks ago

The CAST syntax is already supported by datafusion:

> SELECT CAST('123' AS bigint) / 10 AS n, CAST(00345 AS varchar(10)) AS s;
+----+-----+
| n  | s   |
+----+-----+
| 12 | 345 |
+----+-----+

What i am looking for with this issue, is the ability to check what is the SQL type of an expression to be used in such CAST (bigint or varchar(10) in the example above).

jayzhan211 commented 3 weeks ago

What do you mean SQL type? I guess you are referring to something else in my mind

A function similar to arrow_typeof which reveals the SQL type of an expression.

Do you have a example of your function?

findepi commented 3 weeks ago

What do you mean SQL type?

e.g. varchar(n), bigint. As in SELECT CAST('123' AS bigint) which is a valid expression accepted by DF.

Do you have a example of your function?

jayzhan211 commented 3 weeks ago

I see. I agree we might need another similar function than arrow_typeof. I guess this would be trivial if we have logical type #11513