Open SemyonSinchenko opened 1 week ago
As I was able to realize, array_insert
does not supported in datafusion. Is the list.rs
a good place to have an implementation of ArrayInsert
and PhysicalExpr
for it?
@andygrove Sorry for tagging but I have questions about the ticket (array_insert
).
array_insert
was added in spark 3.4, so all the 3.3.x tests are obviously failed. I checked and it looks like the EoL for 3.3 is about the end of 2024. Technically I think I can try to workaround tests in 3.3.x by reflection API, my question is mostly should I do it due to soon EoL of the 3.3.x?array_insert
is not supported in DataFusion. I made an implementation (and it looks like it works, except negative indices and corner cases). Is the list.rs
a good place for it? Or should I move my code somewhere else?Int32
for position argument, is it OK if I will support only int32 too? In theory, other types can be supported too, but I'm still trying to realize how to achieve it and it may become complex...Thanks in advance! That is my first serious attempt to contribute to the project, so sorry If I'm annoying.
Thanks, @SemyonSinchenko. I think it's fine to skip the test for Spark 3.3. I plan on reviewing this PR in more detail tomorrow, but it looks good from an initial read.
Attention: Patch coverage is 59.09091%
with 9 lines
in your changes missing coverage. Please review.
Project coverage is 34.27%. Comparing base (
845b654
) to head (8b58d8d
). Report is 18 commits behind head on main.
Files with missing lines | Patch % | Lines |
---|---|---|
.../scala/org/apache/comet/serde/QueryPlanSerde.scala | 59.09% | 7 Missing and 2 partials :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This PR is ready for the review. The only failed check failed due to the internal GHA error:
GitHub Actions has encountered an internal error when running your job.
I am not sure if this should be done together with this PR but it would some add "free" tests. Spark introduced with 3.5 array_prepend which it implements with array_insert and starting 4.0 it also implements array_append with array_insert. If you want you can copy the array_append tests and replace array_append with array_prepend for Spark 3.5+ and enable the array_append tests for spark 4.0. You can also ignore this comment if you do not agree or if it leads to unrelated errors.
Thanks for comments and suggestions!
This PR is ready for the review again.
What were changed from the last round of the review:
array_prepend
(spark 3.5+) and enabled the test for array_append
(spark 4.0+);null
;These tests pointed my attention to the uncovered parts of the code that I fixed and I also added couple of additional tests to the native
part. I revisited how spark do array_insert and it is more tricky than I realized at the first glance. I added few additional comments to the native part of the implementation and fixed the behavior.
At the moment this PR is tested in multiple ways:
array_insert
under the hood (NULLS, different data types, etc.);So, it looks to me, that all the possible cases are covered and the behavior is the same like in spark.
I am not sure if this should be done together with this PR but it would some add "free" tests. Spark introduced with 3.5 array_prepend which it implements with array_insert and starting 4.0 it also implements array_append with array_insert. If you want you can copy the array_append tests and replace array_append with array_prepend for Spark 3.5+ and enable the array_append tests for spark 4.0. You can also ignore this comment if you do not agree or if it leads to unrelated errors.
Done!
Which issue does this PR close?
Related to #1042
array_insert: SELECT array_insert(array(1, 2, 3, 4), 5, 5)
Rationale for this change
As described in #1042
What changes are included in this PR?
QueryPlanSerde.scala
: I added an additional case for the array insert;expr.proto
: I added a new message for theArrayInsert
;planner.rs
: I added a case for the array_insert;list.rs
:ArrayInsert
struct;PhysicalExpr
,Display
andPartialExpr
for it;fn array_insert
How are these changes tested?
At the moment I added a simple tests for
fn array_insert
and a test forQueryPlanSerde
.