apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.46k stars 733 forks source link

ADBC FFI types and possible abstraction #3540

Open wjones127 opened 1 year ago

wjones127 commented 1 year ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Not sure if this should live here or apache/arrow-adbc. Probably depends on process we want for release.

It would be nice to have a library crate that contains the struct definitions for FFI structs and the error codes as an enum.

Additionally, it may be possible to create a module with a macro and traits that allows a developer to implement the ADBC API without having the interact directly with the FFI structs and write unsafe code. I'm prototyping such an abstraction right now.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

wjones127 commented 1 year ago

I will have a draft PR up in the next few days.

alamb commented 1 year ago

I think adding adbc ffi as a crate (like arrow-flight would be a reasonable idea)

wjones127 commented 1 year ago

I'm actually having second thoughts about putting this in the arrow-rs repo.

Having it here has some benefits:

But if we put it in arrow-adbc:

@alamb What are your thoughts on these tradeoffs? Am I missing anything?

alamb commented 1 year ago

@alamb What are your thoughts on these tradeoffs? Am I missing anything?

I think you have identified the tradeoffs.

FWIW I don't think there is any technical reason you couldn't feature gate arrow2 support in a crate that happened to be in the arrow-rs repo.

cc @ritchie46 for his thoughts related to pola.rs integration, if any.

ritchie46 commented 1 year ago

I think that feature gating should definitely be possible. Eventually I hope we can come up with some arrow relates traits that are independent of implementation, similar to the DataFrame protocol in python.

wjones127 commented 1 year ago

FWIW I don't think there is any technical reason you couldn't feature gate arrow2 support in a crate that happened to be in the arrow-rs repo.

That's a good point!

So far, I've found the tricky parts tend to be understanding the semantics of the C ABI, especially around thread-safety and what our definition of "uninitialized" is. So I'm now leaning towards arrow-adbc repo, since the primary reviewers there will be most familiar with that. Of course, we'll need some folks who understand Rust well to be involved there as well.

I've drafted a PR over there with the driver manager: https://github.com/apache/arrow-adbc/pull/416