apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.45k stars 722 forks source link

Add `ListViewArray` and `LargeListViewArray` #5375

Open alamb opened 6 months ago

alamb commented 6 months ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do. Recently two new types were added to the Arrow format that make it more suitable for certain types of operations on Lists

Specifically when doing filtering / take with List data, creating a new ListArray or LargeListArray requires copying the underlying lists to a new, packed buffer. The "ListView" was designed to solve this limitation and recently added to the Arrow spec.

Describe the solution you'd like I would like to implement ListViewArray and LargeListViewArray following the spec: The spec: https://arrow.apache.org/docs/format/Columnar.html#listview-layout

Initially, I would suggest we get the basic types in place:

Then as follow on PRs, add support to key kernels:

Describe alternatives you've considered

Additional context This is similar in spirit to the StringViewArray and BinaryViewArray described in https://github.com/apache/arrow-rs/issues/5374

Tasks:

Kikkon commented 6 months ago

Hi @alamb

I'm interested in the progress of this issue regarding the addition of ListViewArray and LargeListViewArray to arrow-rs. I've been following the discussion and would like to know if there have been any updates or if there's anything specific I can assist with to help move this forward.

alamb commented 6 months ago

Hi @alamb

I'm interested in the progress of this issue regarding the addition of ListViewArray and LargeListViewArray to arrow-rs. I've been following the discussion and would like to know if there have been any updates or if there's anything specific I can assist with to help move this forward.

Hi @Kikkon -- that is great news. 🙏

There have been discussions related to implementing StringView and BinaryView as part of https://github.com/apache/arrow-rs/issues/5374 and I expect that work to begin shortly (within a month maybe?).

I don't know of any similar work afoot for ListViewArray but I suspect we could follow a very similar pattern to https://github.com/apache/arrow-rs/issues/5374 (e.g. implement the basic array structure first, followed by cast, take, and filter kernels)

Does that make sense?

Kikkon commented 5 months ago

Hi @alamb I have already created some preliminary issues: https://github.com/apache/arrow-rs/issues/5492 and pull requests https://github.com/apache/arrow-rs/pull/5493 . Regarding the subsequent tasks, if time permits, I can also try it.

alamb commented 5 months ago

Thank you @Kikkon -- that sounds awesome

Kikkon commented 5 months ago

@alamb Since #5492 has already been merged, I've created an issue #5501 to keep track of the progress. Is there anything that needs to be added?"

alamb commented 5 months ago

@alamb Since #5492 has already been merged, I've created an issue #5501 to keep track of the progress. Is there anything that needs to be added?"

Thanks @Kikkon

I added #5501 to the top of this ticket. In terms of next steps, I think the list of items we are finding on https://github.com/apache/arrow-rs/issues/5374 is worth looking at (e.g. IPC, support for filter/take, etc). We'll keep that list updated