elixir-explorer / adbc

Apache Arrow ADBC bindings for Elixir
https://arrow.apache.org/adbc/
Apache License 2.0
50 stars 16 forks source link

Can we simplify the memory management? #1

Closed Qqwy closed 1 year ago

Qqwy commented 1 year ago

The current internal structure of the resource returned from a call to Adbc.Statement.execute_query(statement) is as following:

adbc_info

(And for the other resources it is similar.)

The content of the NifRes<T> is behind a pointer (so it stores a *T) and is managed by enif_alloc/enif_free. But this extra layer of indirection might be unneccesary.

I believe a NifRes<T> could just store the T directly.

This will simplify memory management considerably, since when the resource is passed to a NIF implemented in another project/language, it can take ownership of the data without requiring a call to enif_free for cleanup.

Cleanup

With this change, we still have full freedom of whether we want to clean up the stream 'eagerly after first use' or 'whenever the GC wants to':

Whenever the Erlang GC wants to

When above steps are followed without extra changes inside either the adbc code or consumer code, this is what happens.

Eagerly after first use

From the consumer side (such as the Rust code inside Explorer), you can simply call the release callback that is part of the ArrowArrayStream struct whenever you've used it. The producer will set the release callback to nullptr to indicate that it was cleaned up and cannot be used again (as per the Arrow spec).

cocoa-xu commented 1 year ago

Yeah, I think we can do that! I'll try it in a few minutes

Qqwy commented 1 year ago

I'm also open to contribute with a PR if you want; let me know what you'd prefer 😊

cocoa-xu commented 1 year ago

Hi @Qqwy, I made the correspondingly changes in #2. Let me know if you have any other suggestions and/or comments.

And all the tests we have at the moment passed locally on my machine. I'll setup some CI workflows later!

josevalim commented 1 year ago

PR #2 has been merged but we most likely can remove the released flag.

cocoa-xu commented 1 year ago

PR #3 removed the released flag and addressed the inconsistent error-handling behaviour in adbc. It should be ready to go. :D

Qqwy commented 1 year ago

I've tested the code from PR #3 by improving the example on Explorer a little.

See the qqwy-adbc4 branch on my fork of explorer. And here are the changes vs. the jv-adbc branch There's two variants of df_experiment: One that borrows (cleanup then happens when the Erlang GC wants to), and one that steals the stream (cleanup then happens when the Rust code is done with it).

It works great! I think we can merge PR #3 and close this issue :blush:

josevalim commented 1 year ago

Awesome, thank you :)

cocoa-xu commented 1 year ago

Thank you @Qqwy! And if I understood correctly, does that mean we can choose how to use (or consume) the stream purely in the Explorer's Rust code and we won't have any memory management issue in NIF either we borrow it or steal it in Rust?

Qqwy commented 1 year ago

Thank you @Qqwy! And if I understood correctly, does that mean we can choose how to use (or consume) the stream purely in the Explorer's Rust code and we won't have any memory management issue in NIF either we borrow it or steal it in Rust?

Yes, exactly! 👍

cocoa-xu commented 1 year ago

That's amazing! Thank you :) I'm going to close this issue now.