apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.37k stars 1.2k forks source link

Failed to cast `[]` to `FixedSizeList(1, Null)` #9158

Open Weijun-H opened 9 months ago

Weijun-H commented 9 months ago

Describe the bug

DataFusion CLI v35.0.0
❯ select arrow_cast(make_array(), 'FixedSizeList(1, Null)');
Arrow error: Cast error: Cannot cast to FixedSizeList(1): value at index 0 has length 0

❯ select arrow_cast([], 'FixedSizeList(1, Null)');
Arrow error: Cast error: Cannot cast to FixedSizeList(1): value at index 0 has length 0

To Reproduce

No response

Expected behavior

No response

Additional context

No response

r3stl355 commented 9 months ago

I wonder if that's meant to work. The following works:

select arrow_cast([null], 'FixedSizeList(1, Null)');

However, if you wanted a zero-sized list then should it be be

select arrow_cast([], 'FixedSizeList(0, Null)');

However that throws the following error

thread 'main' panicked at arrow-datafusion/datafusion/common/src/scalar.rs:3184:5:
assertion `left == right` failed
  left: 0
 right: 1
alamb commented 9 months ago

Panic'ing is definitely not good

r3stl355 commented 9 months ago

I'll see what I can do

r3stl355 commented 9 months ago

take

r3stl355 commented 9 months ago

I've done some digging but did not find an easy fix, only few options listed below. Happy to follow up but need a decision on which fix to attempt.

The following works select arrow_cast([null], 'FixedSizeList(1, Null)'); so it's logical to use FixedSizeList(0, Null) when casting an empty array (select arrow_cast([], 'FixedSizeList(0, Null)');). However, that doesn't work because of the following:

https://github.com/r3stl355/arrow-datafusion/blob/3b355c798a3258f118016b33f26c5a55fed36220/datafusion/common/src/scalar/mod.rs#L231

The possible fix options are:

  1. Raise an exception if 0 is used as a cast target type (i.e. FixedSizeList(0, Null)'))
  2. Try to convert FixedSizeList(FieldRef, 0) to FixedSizeList(FieldRef, 1) before calling cast_with_options but A. this feels really wrong and B. may still not work
  3. Raise an issue in Arrow asking to return a non-empty array when cast_with_options is called with FixedSizeList(FieldRef, 0). I'll do some digging there to see if it's possible, e.g if FixedSizeListArray<0>[NullArray(0),] would be a valid type

Lastly, this error happens when displaying the result but not when applying some other functions to it, e.g. this following works but its the only function I tested it with:

select arrow_typeof(arrow_cast([], 'FixedSizeList(0, Null)'));
jayzhan211 commented 9 months ago

I prefer 1. I think Fixedsizelist with len 0 is the same as an empty list. I don't think there is any useful case that we need to cast an empty list to Fixedsizelist(0, type). Return exec_error if casting to Fixedsizelist(0, any type). We just need to avoid panic for this casting.

r3stl355 commented 9 months ago

@Weijun-H was there any specific reason you were trying to achieve this (i.e. select arrow_cast(make_array(), 'FixedSizeList(1, Null)');)?

Weijun-H commented 9 months ago

@Weijun-H was there any specific reason you were trying to achieve this (i.e. select arrow_cast(make_array(), 'FixedSizeList(1, Null)');)?

There are no particular use cases now, I am working on #9108, which reminded me of this case. And also I vote for the first solution, which is more reasonable.

r3stl355 commented 8 months ago

I unassigned myself from this issue as I don't have much bandwidth at the moment so maybe someone else is willing to implement the changes. If nobody does then I'll come back to this in 2-3 weeks.

r3stl355 commented 7 months ago

Looks like this is still open, happy to resume if noone else is working on it