apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.51k stars 746 forks source link

Add a feature to allow creating Buffers from Vec with custom allocators (allocator_api) #3960

Open jhorstmann opened 1 year ago

jhorstmann commented 1 year ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

in #3920 and #3917 support was added to to create buffers from standard Rust vectors. The currently instable allocator_api feature extends Vec to support custom allocators, using functions such as new_in. In our product we are using such custom allocators to track the memory usage of individual queries. I'd like to add a similarly named feature to arrow-rs which would generalize the Buffer::from_vec and MutableBuffer::from_vec functions.

Describe the solution you'd like

Describe alternatives you've considered

Something similar can be achieved using Buffer::from_custom_allocation but requires unsafe and dealing with pointers.

Additional context

alamb commented 3 days ago

In our product we are using such custom allocators to track the memory usage of individual queries. I'd like to add a similarly named feature to arrow-rs which would generalize the Buffer::from_vec and MutableBuffer::from_vec functions.

As @tustvold , @waynexia and I are discovering on https://github.com/apache/arrow-rs/pull/6336, adding the APIs to Buffer and MutableBuffer is just the start -- to really achieve the usecase I think we would need to preserve the allocation information through all the various kernels / transformations that arrow-rs provides.

Also, @haohuaijin offers another potential usecase for this feature that is "accurately track total memory used by multiple arrow arrays that may share the same underlying Buffers (e.g. that were sliced, etc) in https://github.com/apache/arrow-rs/issues/6439