PyO3 / pyo3

Rust bindings for the Python interpreter
https://pyo3.rs
Other
11.47k stars 694 forks source link

Added `stable_deref_trait::StableDeref` implementation for `PyRef` and `PyRefMut` #4195

Closed JRRudy1 closed 1 month ago

JRRudy1 commented 1 month ago

Introduction

This is a simple change that adds an implementation of the stable_deref_trait::StableDeref trait for the PyRef and PyRefMut RAII guards. This unsafe trait is used by downstream crates as marker for types that deref to a stable address that is guaranteed to stay valid for the lifetime of the type (not just the deref borrow), which should apply to PyRef(Mut). Implementing this trait enables some powerful synergies with the new Bound API, as discussed below.

Motivation

I have been working a lot with the Bound API and the MyClassMethods idiom, and I absolutely love it. The pyclass macro and related features already provided a powerful interface for using Rust from Python, but the new API has opened up a world of opportunities for using Python from Rust. However, there is one pain point that I have run into in my projects when trying to implement ergonomic Rust interfaces to PyClass objects, and this pull request makes it possible to work around it.

Consider the simple case where you want define an extension trait with a method that exposes a reference to a field of your Bound pyclass, as discussed in this section of the PyO3 user guide:

use pyo3::prelude::*;

#[pyclass]
pub struct MyClass {
    data: [i32; 100]
}

pub trait MyClassMethods<'py> {
    fn data(&self) -> &[i32; 100];
}

impl<'py> MyClassMethods<'py> for Bound<'py, MyClass> {
    fn data(&self) -> &[i32; 100] {
        &self.borrow().data
    }
}

This is clearly not possible, since the borrowed PyRef gets dropped when the function returns and we cannot return a reference to the data it guards. Of course, this is not unique to Bound/PyRef; the same problem would arise if we tried to return a reference to data borrowed from a RefCell which needs the Ref guard to stay alive.

In simple cases like this you could find a way work around it, such as by making the data field public and have the Bound<MyClass> user explicitly borrow a PyRef to access it, but this puts a significant limitation on the API's you can provide in more complex use cases.

The owning_ref crate provides a clever solution to this limitation by packaging the the RAII guard alongside the reference (as *const T) in an OwningRef struct that derefs to the desired reference. By returning this struct instead of a bare &T, you are effectively able to return a reference to the field while keeping the PyRef guard alive to safely protect it:

use pyo3::prelude::*;
use owning_ref::OwningRef;

pub type OwningPyRef<'py, C, T> = OwningRef<PyRef<'py, C>, T>;

#[pyclass]
pub struct MyClass {
    data: [i32; 100]
}

pub trait MyClassMethods<'py> {
    fn data(&self) -> OwningPyRef<'py, MyClass, [i32; 100]>;
}

impl<'py> MyClassMethods<'py> for Bound<'py, MyClass> {
    fn data(&self) -> OwningPyRef<'py, MyClass, [i32; 100]> {
        OwningRef::new(self.borrow()).map(|c| &c.data)
    }
}

fn data_sum(obj: Bound<MyClass>) -> i32 {
    obj.data().iter().sum()
}

This now works! And the same process can be applied to expose &mut T by wrapping PyRefMuf in an OwnedRefMut.

The only obstacle is that the OwningRef methods require Bound<MyClass> to implement stable_deref_trait::StableDeref (which it reexports as StableAddress), which can only be done within the pyo3 crate thanks to the orphan rule (so I have had to use a newtype wrapper as a workaround). This PR hopes to fix that and make this ergonomic solution possible .

Safety

Since StableDeref is an unsafe trait, the safety of implementing it for PyRef and PyRefMut must be considered. The contract for implementing this trait requires that the PyRef(Mut) will dereference to a stable address for the duration of its lifetime even when moved, and that the result of calling deref will also stay alive for this duration (not only for the duration of the &self lifetime of the deref call).

I am fairly certain that this condition is met for PyRef and PyRefMut, which exist for the sole purpose of guarding access to the underlying Py pointer until they are dropped, but please double-check my reasoning in case there are any edge cases that I am missing. For reference, this trait is safely implemented for the Ref and RefMut guards borrowed from a RefCell, which are direct analogues to PyRef and PyRefMut.

Considerations

Thank you for your time in considering this PR! If you have any question or concerns I would be happy to iterate on a solution.

mejrs commented 1 month ago

I'm inclined to not accept this PR. I don't want to encourage users to design their programs this way. PyRef(Mut) guards should not be kept alive for long - if they are it increases the chance that users will run into borrow panics. It just does not lead to a good design.

In your example you can make the pyclass frozen, and just use get: https://docs.rs/pyo3/latest/pyo3/prelude/struct.Py.html#method.get There is no PyRef involved with that.

Also, both the owning_ref and stable_deref_trait crates are deeply problematic. They should never be used.

davidhewitt commented 1 month ago

I completely agree about the PyRef(Mut) guards being undesirable for this use case.

It is probably ok for a wrapper around Bound<T> to make sense for this case when T: Sync and is a #[pyclass(frozen)]. But I'm unenthusiastic about adding such types within PyO3 right now.

I will note there is probably some similarity here with the ideas encapsulated in the PyBackedStr type and perhaps will point to a new more generic abstraction in the future.

JRRudy1 commented 1 month ago

Thanks for your time in reviewing; if you see the change as being more risk than it is worth to incorporate into pyo3, then I trust your judgment well above my own and can just keep using a newtype approach to utilize it in my own projects. That said, I am curious to dig further into your reasoning on some of those points.

That is an interesting take about the owning_ref crate being inherently problematic. It seems to be widely used, and I have tried without success to devise a scenario where Miri finds an issue. Can you elaborate?

I gave a very simple example for brevity, which could indeed be solved using pyclass(frozen) and .get, but in more complex use cases it would be very limiting to restrict all mutability. The owning_ref solution also makes it possible to implement something like fn data_mut(&mut self) -> OwningPyRefMut<'py, MyClass, [i32; 100]> to expose impl DerefMut<[i32; 100]>, which would not be possible using your suggestion. Allowing this pattern opens up a lot of flexibility in the API's that can be implemented, and this seems safe to me since the PyRefMut continues to dynamically protect it from being aliased. Are there edge cases I'm not considering where this would be unsound?

And your point about keeping the PyRef around increasing the risk of panics seems more like a user/downstream implementer's concern than pyo3's. If there is a non-negligible chance of borrow conflicts, then they can/should have the data/data_mut methods use .try_borrow instead and return a Result (perhaps renaming them as try_get_data/try_get_data_mut). But in my case this really isn't a concern (since the concurrency is happening in pure-Rust code that the serial pyo3-based UI layer calls into), and it is nice to have the flexibility to make these design decisions based on the problem at hand.

JRRudy1 commented 1 month ago

I found this repo discussing some issues with owning_ref, and wow you really weren't kidding. I hadn't messed with the problematic as_owner/with_owner and similar methods since they aren't relevant to the use case of exposing a field within a PyRef/PyRefMut, which still does appear to be sound.

This PR doesn't actually involve owning_ref, and implementing StableDeref wouldn't really cause issues by itself assuming the safety contract is actually satisfied, but I totally understanding rejecting it when my motivating use case is so questionable.

However, I still think there would be value in facilitating borrowing fields from Bound pyclasses, which could be implemented without any dependencies... something like this:

pub struct PyRefMap<'p, T: PyClass, U: ?Sized> {
    owner: PyRef<'p, T>,
    reference: *const U,
}

impl<'py, T: PyClass> PyRef<'py, T> {
    pub fn into_map<F, U: ?Sized>(self, f: F) -> PyRefMap<'py, T, U>
        where F: FnOnce(&T) -> &U 
    {
        PyRefMap {reference: f(&*self), owner: self}
    }
}

impl<'p, T: PyClass, U: ?Sized> Deref for PyRefMap<'p, T, U> {
    type Target = U;

    fn deref(&self) -> &U {
        // we own the `PyRef` that is guarding our (shared) access to `T`
        unsafe { &*self.reference }
    }
}

This lets you convert your PyRef<MyClass> into a similar type that deref's to the field instead of the pyclass directly:

let bound = Bound::new(py, MyClass{data: [i32; 100]})?;
let data = bound.try_borrow()?.into_map(|c| &c.data);

If you see any potential in adding functionality like this I could flesh it out into a new PR for further discussion, but otherwise I think it's fair to move onto more pressing issues. Thanks!

alex commented 1 month ago

We do a lot of this with self_cell in pyca/cryptography: https://github.com/pyca/cryptography/blob/main/src/rust/src/x509/certificate.rs#L32-L39 for example

davidhewitt commented 1 month ago

The PyRefMap idea is very welcome, see #2300 - I'm sure it'd be useful and just needs exploration.

JRRudy1 commented 1 month ago

Thanks for linking that thread @davidhewitt, I read through it and just submitted a draft PR #4203 implementing a possible solution roughly based on my comment above.