Open alamb opened 1 year ago
@melgenek I wonder if you have any thoughts on this idea?
As far as I can tell, sqllogictest in general, and sqllogictest-rs
do not support column name checks right now.
There are some ways to extend sqllogictest-rs
to support columns. It seems that to add native support for column names one would need to do the following:
1) add a colnames
column check support to the query
clause to optionally switch the check on/off, where off is the default. Column names could be the first row in the text representation
query II rowsort colnames
select 1 as one, 2 as two;
----
one two
1 2
2) update DBOutput and RecordOutput to have a field column_names
.
3) update implementations of the sqllogictest::AsyncDB
to return column names along with types and results
4) introduce a validator for column names similar to the type and result validator. A default implementation would probably be a no-op implementation to prevent behavior changes for library users other than Datafusion. A real implementation would likely just lowecase names and compare strings.
On the other hand, one could make comparisons work on the Datafusion side.
It seems that the EXPLAIN
statement already gives the alias names for projections, so it is possible to run the same query twice: once with EXPLAIN, once without. This way both names and values are checked. Of course, there is a lot more information than just names in an EXPLAIN output.
Another way is to create a more powerful version of an arrow_typeof
. For example, Databricks has a DESCRIBE QUERY and Snowflake a DESCRIBE RESULT that show the expected output metadata in format similar to
+---------+------------+
|col_name |data_type |
+---------+------------+
|one | int |
|two | bigint |
+---------+------------+
This way both specific Arrow types and names could be checked with one query.
Thank you @melgenek -- this is excellent advice.
We already have a version of this (though this also looks at the data) with the DESCRIBE dataframe method... Maybe we could connect it into SQL
https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html#method.describe
Is your feature request related to a problem or challenge?
As we port our tests to sqllogictests https://github.com/apache/arrow-datafusion/issues/4460 it is important to retain test coverage
One thing that was noticed is that sqllogictest tests doesn't cover cases if the column name is not expected.
So for example, it will show the following outputs as the same
to
Describe the solution you'd like
It would be nice to have some way (perhaps optionally) that verified the column names as well
Describe alternatives you've considered
No response
Additional context
@comphead brought thus up on https://github.com/apache/arrow-datafusion/pull/6329#pullrequestreview-1425450768