apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.49k stars 1.02k forks source link

Default generated column name is confusing on casts #4723

Open comphead opened 1 year ago

comphead commented 1 year ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do. There were bunch of discussions that generated column names confusing, hard to read and hard to reuse in outer queries. Apart from that they represents wrong types when cast happens, like in example below, the column name suggests its Utf8 but in fact its Date32

❯ select date '2000-01-01', arrow_typeof(date '2000-01-01');
+--------------------+---------------------------------+
| Utf8("2000-01-01") | arrowtypeof(Utf8("2000-01-01")) |
+--------------------+---------------------------------+
| 2000-01-01         | Date32                          |
+--------------------+---------------------------------+

There were some proposals on column naming standard https://github.com/apache/arrow-datafusion/issues/3990 @andygrove highlighted the part of the problem https://github.com/apache/arrow-datafusion/issues/3722

Describe the solution you'd like Column names shouldn't be confusing

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

comphead commented 1 year ago

@alamb I already worked on this https://github.com/apache/arrow-datafusion/issues/3722 and would like to continue if we can decide the column name convention