apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.36k stars 3.49k forks source link

[C#] Need bigger ArrowBuffer size for large data #38086

Open Platob opened 12 months ago

Platob commented 12 months ago

Describe the enhancement requested

ArrowBuffer are currently based on ReadOnlyMemory

To scale to larger data (with length > int32), arrow buffers will need to handle byte length > int32, maybe a struct with nested memory buffers

public readonly partial struct ArrowBuffer : IEquatable<ArrowBuffer>, IDisposable
    {
        private readonly IMemoryOwner<byte> _memoryOwner;
        private readonly ReadOnlyMemory<byte> _memory;

Component(s)

C#

CurtHagenlocher commented 6 months ago

We can't chop the buffers up into a collection of smaller buffers without losing in-memory interoperability via the C API. I think to make this work, we would need to have implementations of LargeMemory<T>, LargeReadOnlyMemory<T>, LargeSpan<T>, LargeReadOnlySpan<T>, LargeMemoryManager<T> and ILargeMemoryOwner<T> which could be used to wrap native allocations and allow safe managed access to it. As I suggested in https://github.com/dotnet/runtime/issues/12221, nothing stops us from defining these types in the C# Arrow implementation itself, though having them in the standard runtime is probably better for code sharing and reuse.

adamreeve commented 1 month ago

Hi @CurtHagenlocher, we might have an intern joining us in the G-Research open source team soon and I thought this could be a useful project for them to work on. Before I propose it as an idea, I just wanted to check that you haven't already started work on this, and would be happy to accept a contribution for this?

CurtHagenlocher commented 1 month ago

@adamreeve Overall, I think that would be great. I've played with it a little but only by starting to implement Large variations of Span, Memory, etc.. I'm also trying to start a conversation with some .NET folks about what they think makes sense, and trying to understand what's possible in terms of the derivative works I've created.