aldanor / hdf5-rust

HDF5 for Rust
https://docs.rs/hdf5
Apache License 2.0
310 stars 85 forks source link

Revamp object identifier ownership #139

Closed mulimoen closed 3 years ago

mulimoen commented 3 years ago

This is a new way of handling object identifiers given by us from the hdf5-c library.

The old approach was to store the handles in a registry, a hashmap that kept track of all previous identfiers. Each object addtionally kept track of whether it is invalid or not through an RwLock. This works very well for our tests, but fails when opening and closing many objects (#76, #137).

This PR instead relies on hdf5-c to keep track of the identifiers and validity. The Handle is now a type which owns the underlying object directly, and ownership is a bit stricter.

In practical terms this does not change much. Opened file objects will not be automatically closed when a file is dropped, the user must set the strong close degree on the file to get this behaviour. Setting the strong close degree is dangerous (in the sense it closes the wrong items), but not unsafe, and a warning has been added to the corresponding function.

The idea is to strengthen the relationship between an object handle on the rust side and the hdf5 side. A handle owns the object, and is cloned by incrementing the reference count and handing out a new object. The old approach was to try and close all related handles, acting as a half FileCloseDegree::Strong. The new approach requires the user to explicitely opt into this behaviour, with all the pitfalls this entails...

Soundness difficulties (but not memory safety concern) leading to double free:

Should we remove or change some items from our API to remove some of these? We could add a counterpart of from_id (from_borrowed_id), make decref hidden, and use unsafe to annotate these functions.

Fixes #76 Fixes #137

mulimoen commented 3 years ago

@aldanor It would be nice if you had a look at this proposal. This is the opposite of keeping the registry, but seems to be easier to reason about regarding ownership of objects.

aldanor commented 3 years ago

This will need to be rebased as #140 is merged. @mulimoen I've read through this, and I guess my main question is: "why does it work?" :) Especially pre-1.10, for 1.8.x. (specifically, in multi-threaded environment)

Also, for the crate user who doesn't want to dabble in source code of this crate, what would be the major usage differences? (in regards to strong close etc)

mulimoen commented 3 years ago

It is simply having a single owner for every ID, taking special care to incref if we share the ID, and never decref more than necessary (special care is taken in tests to avoid this). The previous code was buggy as it always decreffed all open handles on file close. There are no multi-threaded concerns as we always operate on the ID using a mutex (serial execution).

For users this means never use strong close (unless dead certain no objects are open), don't decref (Drop is ok), take care during FFI/hdf5 to never give ownership of an ID without an incref.

mulimoen commented 3 years ago

Seems DynValue leaks memory, which is picked up by the address sanitizer. I'm debugging this ATM.

mulimoen commented 3 years ago

6b66419 changed the inner type of OwnedDynValue to a Box. This was for making it obvious that the type is fixed-size. If https://github.com/rust-lang/rust/issues/63291 is stabilized we might want to use new_uninit_slice.

aldanor commented 3 years ago

Posted the last batch of minor comments ^

I was wondering, do we even really want incref() and decref() methods on Handle? Is there a use case? What if we removed both and replaced them with try_borrow()? (same as in Object - at which point the line between Handle and Object starts to become very thin...)

mulimoen commented 3 years ago

I suppose advanced users of hdf5 want to use underlying functions which may take ownership, requiring incref. decref may be dangerous and could invalidate a Handle, but I guess there are uses?

aldanor commented 3 years ago

I suppose advanced users of hdf5 want to use underlying functions which may take ownership, requiring incref. decref may be dangerous and could invalidate a Handle, but I guess there are uses?

I think the point is, if you use the high-level interface (i.e. hdf5) crate, I can't really imagine a case where you will be increfing or decrefing things (i.e., where try_new, try_borrow, clone and drop will not be sufficient for more advanced cases).

If you really need to dig into some advanced territory (i.e. using hdf5-sys directly), chances are you will probably be using straight hid_t anyway, no?

I'm just trying to think of cases and I can't really find none. You could think of hdf5 crate itself as a fairly "advanced" use of itself.

Well, given that this update is already quite big, let's leave it be, but perhaps remove it in a later update (I'll add a ticket) and maybe remove Handle altogether then which will simplify things. Now that there's no registry and it's a simple wrapper around an int, I'm not sure it's not needed at all.

aldanor commented 3 years ago

One more thing - could you get rid of a manual incref in Handle::clone() and use try_borrow? I think this is the only manual incref in the entire codebase now (aside from tests). And the only manual decref is in impl of Handle::drop.

mulimoen commented 3 years ago

I find Handle is a good abstraction, all ownership is taken care of here and it is the superclass of the ObjectClass.

aldanor commented 3 years ago

I find Handle is a good abstraction, all ownership is taken care of here and it is the superclass of the ObjectClass.

Well, yea, it's basically our hid_t with a few methods and an auto-drop now. Perhaps it should stay but given it's simplicity maybe a few things could be cleaned up. I'll give it a few thoughts.

aldanor commented 3 years ago

Ok well, let's try to merge this (hope it doesn't shoot us in the foot, in some multi-threaded cases or some other weird stuff like that) :)