Open alamb opened 1 year ago
I actually think in general the coercion rules for uniform
should attempt to coerce all arguments to the same type as the first argument
@alamb this issue description confronts with #3123 Which one should be done?
@comphead I think https://github.com/apache/arrow-datafusion/issues/3123 and this ticket are different
This ticket is about the input types of the arguments to array[]
https://github.com/apache/arrow-datafusion/issues/3123 is about the output type produced by the array[]
function
@alamb I checked 3 options
DataFusion CLI v11.0.0
❯ select make_array(1,2,'3');
+----------------------------------------+
| makearray(Int64(1),Int64(2),Utf8("3")) |
+----------------------------------------+
| [1, 2, 3] |
+----------------------------------------+
1 row in set. Query took 0.006 seconds.
❯ select [1, 2, '3'];
NotImplemented("Arrays with different types are not supported: {Int64, Utf8}")
❯ select array[1,2,'3'];
NotImplemented("Arrays with different types are not supported: {Utf8, Int64}")
here I can see nonconsistent behaviuor. I suppose all 3 should try coerce to first datatype and fail if its not possible?
here I can see nonconsistent behaviuor. I suppose all 3 should try coerce to first datatype and fail if its not possible?
I think that would make sense -- thank you for looking into this @comphead
@alamb it appears we have 2 code bases for supporting arrays:
make_array(x, y)
functionarray[x, y]
or [x, y]
constructions.All those constructions do the same work, but they can be out of sync like now. do we need to support all 2 branches?
All those constructions do the same work, but they can be out of sync like now. do we need to support all 2 branches?
I am sorry for the delay in response. The short answer is I don't really know what the desired / correct behavior is here -- I think array[]
syntax is from postgres and make_array()
(formerly array()
) is from Spark
Yes, we can support all 3 of them. My concern there are 2 different codebases, and no easy way to find a common denominator for them. I can make all 3 functions works the same way, but in future those codebases can be out of sync again, leading make_array
works differently comparing to array
or []
.
🤔
@alamb one more thing I found
make_array
works on coerce expressions, and coerce doesn' consider first element as driving cast element. Rather it finds common type among all array values. Are you ok to with current behaviuor or you would like to change it to make first datatype as target type?
Are you ok to with current behaviuor or you would like to change it to make first datatype as target type?
Ideally we would follow some well established model (like posgres) but if that is too challenging I think we can keep the existing behavior too
Describe the bug
make_array()
will error if the types can not be coerced to a common type, as in postgres and sparkTo Reproduce
Run this query (see
query_array_scalar_coerce
test added in https://github.com/apache/arrow-datafusion/pull/3122)This results in an error:
Expected behavior I expect this to return like postgres where it has coerced all the arguments to int (tried to coerce all arguments to the same type as the first, which in this case is
1
and thusint
Additional context Found in https://github.com/apache/arrow-datafusion/pull/3122