PyO3 / rust-numpy

PyO3-based Rust bindings of the NumPy C-API
BSD 2-Clause "Simplified" License
1.11k stars 106 forks source link

RFC: Perform borrow checking through a capsule-based API #361

Closed adamreichold closed 1 year ago

adamreichold commented 1 year ago

This places the global state required for dynamic borrow checking of NumPy arrays into a capsule exposed as the numpy.core.multiarray._BORROW_CHECKING_API attribute.

This has two benefits: First, all extensions built using rust-numpy will share this global state and hence cooperate in borrow checking. Second, in the future, other libraries that want to borrow check their access to NumPy array can cooperate using this C-compatible API.

This does not checking the borrow checking implementation at all, but instead of accessing Rust static data, all accesses are funneled through the above-mentioned capsule API with access to it being cached for performance reasons as we already do for the _ARRAY_API and _UFUNC_API capsules.

This means that eventually, the implementation of the borrow checking API could be different from the one the current extension would provide with the adaptors to the C API taking of any differences in e.g. the definition of BorrowKey. For now, they will of course usually be the same and this change just adds a huge amount of boiler plate to go from Rust to C-compatible back to Rust.

Closes #359

cc @alex This is not the buffer-oriented protocol change you proposed in "Buffers on the edge", but may be of interested anyway as it is related if limited to NumPy arrays.

adamreichold commented 1 year ago

Will there need to be a way for the capsule contents to be "upgraded" at runtime if a too old version is present? Not really sure what functionality might be broken if a newer rust-numpy has to use an older capsule version. (Which might be unpredictable due to initialization order being deterministic but hard to predict, I think.)

At the moment, I would not think this reasonably possible as the existing state cannot be safely transformed by an extension which wishes to upgrade the API if just due to ABI concerns as both extensions might have been built using different Rust versions. (Or there might eventually be implementations not written in Rust at all having a completely incompatible internal representation behind the flags field.) So this would mean that the first extension to load and instantiate the capsule determines the API version.

However, since changes to this API should be additive with older parts of it kept working (by just ignoring the additional fields), I suspect that we might be able to degrade gracefully if a newer rust-numpy encounters an older API version, i.e. functionality which worked before continues to work and newer functionality fails with a hopefully clear enough error message (by failing functions needing fields not yet present).

I think the main problem is that I am not confident whether even the promise of backwards compatibility can be kept since this is relatively new and we have yet to get any external feedback on the borrow checking we released with rust-numpy 0.17.0. IMHO, this is a good argument for outright waiting with the capsule API for a few rust-numpy releases to gather more insight whether at least the functionality we have now can be expected to stay in place for a reasonable amount of time.

adamreichold commented 1 year ago

Also, is it a safety/security concern if arbitrary code could spoof the capsule before this loads?

I think this is certainly a safety concern, but I am not sure it can be helped. It is always possible to subvert Rust's type system by malicious or misguided action. Similarly, the borrow checking itself is not a sandbox, it just prevents programmers from making genuine mistakes.

More specially, loading e.g. a no-op borrow checker API would be similar to preloading libeatmydata to turn fsync into a no-op as far I can see. One can shoot oneself into the foot by doing so, but one has certainly asked for it then.

adamreichold commented 1 year ago

I think the main problem is that I am not confident whether even the promise of backwards compatibility can be kept since this is relatively new and we have yet to get any external feedback on the borrow checking we released with rust-numpy 0.17.0. IMHO, this is a good argument for outright waiting with the capsule API for a few rust-numpy releases to gather more insight whether at least the functionality we have now can be expected to stay in place for a reasonable amount of time.

I added another commit which makes the shared API much narrower, taking only *mut PyArrayObject instead of the various parts of the borrow key. This means it should be much more likely that we can keep this backwards compatible as it is not bound to the current implementation of borrow checking. Since it also uses the NumPy FFI exclusively, it would be easier to split this out into a separate crate as well which could be used by other crates besides rust-numpy. However, it does have a performance cost (I have not benchmarked it yet) because it means we cannot reuse the borrow key derived during acquire for release but have to rederive it inside the shared implementation.

adamreichold commented 1 year ago

I have not benchmarked it yet

Since the capsule API looks as minimal as it will get, I ran the bechmarks:

 name                      main ns/iter  capsule ns/iter  diff ns/iter  diff %  speedup 
 additional_shared_borrow  61            115                        54  88.52%   x 0.53 
 exclusive_borrow          123           134                        11   8.94%   x 0.92 
 initial_shared_borrow     137           148                        11   8.03%   x 0.93 

All of them being regressions are expected as there is the added cost of crossing the FFI boundary and using dynamic dispatch via the capsule. additional_shared_borrow being the worst of the bunch is also expected as it relies on reusing the borrow key which now needs to be recomputed to avoid making its structure part of the capsule API.

Finally, at less than 150 ns/iter for all three operations, I think the improved prospects for backwards compatibility and future improvements of the internal data structures used to perform the borrow checking which the minimal API provides are worth it.

Most applications will likely spent much more time operating on the resulting array views themselves and if not, there is always the unsafe escape hatch that bypasses the dynamic borrow checking.

adamreichold commented 1 year ago

The benefit this allows multiple extensions using rust-numpy to work together is definitely worth the slight additional overhead and compatibility constraints.

I agree, the cross extension borrow checking alone makes this worth it IMHO.

@alex Any thoughts on the cross language borrow checking discussed above?

@kngwyu Any concerns before I merge this which imples that it will be part of the 0.18 release thereby committing us to backwards compatibility?

alex commented 1 year ago

Unfortunately not really, I haven't spent that much time thinking about what it would look like for extensions to provide this functionality themselves, most of my thoughts have been about what'd it mean to have this for buffers in general.

That said, I think the hardest question is: how (if at all) do you handle objects that represents sub-slices (views) of memory owned by some other object?

adamreichold commented 1 year ago

That said, I think the hardest question is: how (if at all) do you handle objects that represents sub-slices (views) of memory owned by some other object?

Our existing dynamic borrow checking tries to handle this by always traversing the chain of NumPy views via the base attribute to its origin and then explicitly considering the data pointer, dimensions and strides by checking a necessary condition for aliasing between two views to be unsatisfied, i.e. we over-approximate conflicts which is safe even if we produce some incorrect borrow checking errors, c.f. the documentation of the borrow module.

(A complete solution of the Diophantine equation governing aliasing between views is possible, but more involved and slower and we released the current version to solicit feedback whether the approximate checking is a problem for anyone.)