Open lidavidm opened 1 year ago
@zeroshade for your consideration
Another possible solution might be to leverage jemalloc or https://pkg.go.dev/github.com/dgraph-io/ristretto/z#Allocator as a way to amortize some of the FFI cost by allocating larger chunks at a time, or something to that effect.
That said, because the allocator is defined as an interface, anyone could easily create their own allocator if they prefer and we can have the simple Malloc-based one you created for now and if someone wants a better allocator can they create/contribute one.
That said, I agree with adding a flag for buffers to denote them as safe to export but that would require modifying the Allocator interface so we can know whether or not to set that flag.
Another possible solution might be to leverage jemalloc or https://pkg.go.dev/github.com/dgraph-io/ristretto/z#Allocator as a way to amortize some of the FFI cost by allocating larger chunks at a time, or something to that effect.
Just implement malloc in go :) Allocate a big chunk via C.malloc, then slice it up and hand out buffers as needed (allocating more chunks as needed)
At least in Java, Netty's allocator (which we use in Arrow) pools allocations based on size classes to avoid having to allocate new buffers
Lol, I mean, yes I'd prefer to avoid implementing malloc in go, hence my suggestion of us just leveraging jemalloc or something directly haha. Or some other way to minimize the FFI. That said, I'm fine with just a simple malloc-based allocator for now and if anyone needs better performance we can re-assess or they can use the allocator interface and build their own and then (hopefully) contribute it.
Fair enough!
Describe the bug, including details regarding any error messages, version, and platform.
cgo requires that C memory cannot contain persistent pointers to Go memory:
But we do exactly this in the C Data Interface:
https://github.com/apache/arrow/blob/4f1d255f3dc57457e5c78d98c4b76fc523cf9961/go/arrow/cdata/cdata_exports.go#L392
We want to do this because we do not want to copy data. However, technically, we can't. And if we enable the runtime checks for this via
GODEBUG=cgocheck=2
, we indeed get a crash:Now, in practice, this is OK, because separately, we keep the Go memory alive by storing the
ArrayData
in the C Data Interfaceprivate_data
via acgo.Handle
. (Well, assuming Go's GC is non-moving.) But technically, this is wrong.What are some solutions?
CgoArrowAllocator
, which allocates buffers vialibarrow
. Then we're storing pointers to C-allocated memory in C-allocated structures. But this gives you a dependency onlibarrow
.CgoAllocator
that allocates viamalloc
. This still means we have to pay FFI overhead, which is not ideal, but is unavoidable. (See apache/arrow#33901.)Both of these require you to remember to do the right thing; if you forget to use the right allocator, you'll still be in violation of the rules. So on top of that, we could add a field to
Buffer
that indicates whether the buffer is safe to export. Then the cgo-based allocators can set this flag, and during exporting, we can error (or possibly copy) if the buffer is not safe for export.Component(s)
Go