elixir-explorer / adbc

Apache Arrow ADBC bindings for Elixir
https://arrow.apache.org/adbc/
Apache License 2.0
50 stars 16 forks source link

Simplified memory management for `NifRes<T>::val` #2

Closed cocoa-xu closed 1 year ago

cocoa-xu commented 1 year ago

This PR simplifies memory management for NifRes<T>::val as described in #1.

Qqwy commented 1 year ago

Most of the changes look really good. However, I don't think the new released field you have added to NifRes is what we want:

If you want to make this library thread-safe (which is probably outside of scope for this particular PR) then I think we need to create our own 'wrapper' implementation of ArrowArrayStream for which all of the callback functions are protected by the (same) mutex before calling into the wrapped ArrowArrayStream.

josevalim commented 1 year ago

I don't think you can consume an arrow stream concurrently, so we will most likely need to attach its creator PID to it and only allow calls from the same PID. Even the connection is not guaranteed to support concurrency, so we will do some wrapping in Elixir land in the future. :)

About val->release, I would prefer to have our own field than assuming something on the ArrowStream one just because one additional boolean field is super cheap and it avoids this particular coupling?

Qqwy commented 1 year ago

I don't think you can consume an arrow stream concurrently, so we will most likely need to attach its creator PID to it and only allow calls from the same PID. Even the connection is not guaranteed to support concurrency, so we will do some wrapping in Elixir land in the future. :)

This is a good approach. I think we can live with the 'only same PID allowed' restriction and it has much less overhead compared to wrapping things with a mutex. 👍

About val->release, I would prefer to have our own field than assuming something on the ArrowStream one just because one additional boolean field is super cheap and it avoids this particular coupling?

What do you want to accomplish with the field? Do you want this library's NifRes type to be able to wrap structs other than those in the Adbc spec?

josevalim commented 1 year ago

@Qqwy I believe none of AdbcStatement, AdbcDatabase, and AdbcConnection have a public release field according to the C spec.

Qqwy commented 1 year ago

@Qqwy I believe none of AdbcStatement, AdbcDatabase, and AdbcConnection have a public release field according to the C spec.

Correct. These structures are allowed to be re-used many times. And they contain a pointer to the AdbcDriver they are managed by, which contains the methods responsible for manipulating them (including releasing them). These methods to manipulate these objects on the 'AdbcDriver' themselves return an error (An AdbcStatusCode other than ADBC_STATUS_OK) when called with a structure that was already released.

I assume the long-term plan for this library is to expose these other structures also to both Elixir and --to be able to do anything useful with them-- to NIFs written in other libraries.

Any pre-existing code dealing with Adbc will talk the Adbc spec without requiring extra work by library builders. If we add an additional 'released' flag on top of this, we require library builders to handle this additional flag properly as well as the handling the release-logic that is part of the Adbc spec.

That is why I prefer to not add the flag.

Are there upsides I am missing?

Qqwy commented 1 year ago

As an example: Many AdbcStatements can be constructed from an AdbcConnection. When the connection is released using AdbcDriver->AdbcConnectionRelease, all the statements that were constructed from this connection are released behind the scenes. But since the producer (e.g. Sqlite) does not know about our extra flag, the flags on the constructed NifRes<AdbcStatement>s will not be set.

So the extra flag can function as a heuristic at best.

josevalim commented 1 year ago

@Qqwy perfect, so I agree we can remove the flag indeed. Thank you for the insights!

--to be able to do anything useful with them--

I don't think we will expose any other resource than the ArrowArrayStream to other NIFs. :)

cocoa-xu commented 1 year ago

Actually this flag is just to prevent any databases/statements/connections to be released twice in Adbc{Database,Connection,Statement}Release as it will crash the program.

josevalim commented 1 year ago

Alright, so let's do this: this PR is definitely a step forward, so let's merge it in. @Qqwy could you please kindly send a PR with your changes on top so we can give it a try afterwards? Thank you all :heart:

josevalim commented 1 year ago

:green_heart: :blue_heart: :purple_heart: :yellow_heart: :heart:

josevalim commented 1 year ago

Oh wait, I realized it is not my project. Sorry @cocoa-xu, force of habit :D

cocoa-xu commented 1 year ago

After checking the source code in adbc, I think we can remove the flag. The crash is caused by some inconsistent error-handling behaviour in adbc. When the databases/statements/connections is in invalid state, these functions, Adbc{Database,Connection,Statement}Release, only return ADBC_STATUS_INVALID_STATE without filling the AdbcError struct...

josevalim commented 1 year ago

Now I feel double-bad for merging. Sorry!

cocoa-xu commented 1 year ago

Definitely no need to feel sorry!

cocoa-xu commented 1 year ago

We can do this in another PR to address the inconsistency in Adbc{Database,Connection,Statement}Release.

Qqwy commented 1 year ago

Thank you both! Collaborating with you two is a lot of fun 😊