apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.54k stars 1.03k forks source link

`array[]` / `make_array()` will error if the types can not be coerced to a common type #3170

Open alamb opened 1 year ago

alamb commented 1 year ago

Describe the bug make_array() will error if the types can not be coerced to a common type, as in postgres and spark

To Reproduce

Run this query (see query_array_scalar_coerce test added in https://github.com/apache/arrow-datafusion/pull/3122)

SELECT [1, 2, '3']

This results in an error:

"Error during planning: Coercion from [Int64, Int64, Utf8] to the signature VariadicEqual failed.",);

Expected behavior I expect this to return like postgres where it has coerced all the arguments to int (tried to coerce all arguments to the same type as the first, which in this case is 1 and thus int

alamb=# select array[1, 2, '3'];
  array  
---------
 {1,2,3}
(1 row)

Additional context Found in https://github.com/apache/arrow-datafusion/pull/3122

alamb commented 1 year ago

I actually think in general the coercion rules for uniform should attempt to coerce all arguments to the same type as the first argument

comphead commented 1 year ago

@alamb this issue description confronts with #3123 Which one should be done?

alamb commented 1 year ago

@comphead I think https://github.com/apache/arrow-datafusion/issues/3123 and this ticket are different

This ticket is about the input types of the arguments to array[]

https://github.com/apache/arrow-datafusion/issues/3123 is about the output type produced by the array[] function

comphead commented 1 year ago

@alamb I checked 3 options

DataFusion CLI v11.0.0
❯ select make_array(1,2,'3');
+----------------------------------------+
| makearray(Int64(1),Int64(2),Utf8("3")) |
+----------------------------------------+
| [1, 2, 3]                              |
+----------------------------------------+
1 row in set. Query took 0.006 seconds.
❯ select [1, 2, '3'];
NotImplemented("Arrays with different types are not supported: {Int64, Utf8}")
❯ select array[1,2,'3'];
NotImplemented("Arrays with different types are not supported: {Utf8, Int64}") 

here I can see nonconsistent behaviuor. I suppose all 3 should try coerce to first datatype and fail if its not possible?

alamb commented 1 year ago

here I can see nonconsistent behaviuor. I suppose all 3 should try coerce to first datatype and fail if its not possible?

I think that would make sense -- thank you for looking into this @comphead

comphead commented 1 year ago

@alamb it appears we have 2 code bases for supporting arrays:

All those constructions do the same work, but they can be out of sync like now. do we need to support all 2 branches?

alamb commented 1 year ago

All those constructions do the same work, but they can be out of sync like now. do we need to support all 2 branches?

I am sorry for the delay in response. The short answer is I don't really know what the desired / correct behavior is here -- I think array[] syntax is from postgres and make_array() (formerly array()) is from Spark

comphead commented 1 year ago

Yes, we can support all 3 of them. My concern there are 2 different codebases, and no easy way to find a common denominator for them. I can make all 3 functions works the same way, but in future those codebases can be out of sync again, leading make_array works differently comparing to array or [].

alamb commented 1 year ago

🤔

comphead commented 1 year ago

@alamb one more thing I found make_array works on coerce expressions, and coerce doesn' consider first element as driving cast element. Rather it finds common type among all array values. Are you ok to with current behaviuor or you would like to change it to make first datatype as target type?

alamb commented 1 year ago

Are you ok to with current behaviuor or you would like to change it to make first datatype as target type?

Ideally we would follow some well established model (like posgres) but if that is too challenging I think we can keep the existing behavior too