PyO3 / pyo3

Rust bindings for the Python interpreter
https://pyo3.rs
Apache License 2.0
12.37k stars 763 forks source link

Support all receiver types for all protocol methods #1206

Closed davidhewitt closed 3 years ago

davidhewitt commented 4 years ago

At the moment the protocol methods are in an inconsistent state: some of them can take PyRef or PyRefMut, and some of them take &self or &mut self.

This is confusing to users and also gets in the way in certain cases like needing to access Python inside a protocol method (which can be obtained from PyRef::py() for example) or wanting to return PyRef<Self>.

The simple solution is to just change all protocol methods to use TryFromPyCell trait. This is however a breaking change.

The better solution is to change all #[pyproto] methods to support any of the five of &PyCell, PyRef<Self>, PyRefMut<Self>, &self and &mut self, just like we support for #[pymethods].

I've had some ideas how to approach this second point so would like to take a shot at it soon.

kngwyu commented 4 years ago

The better solution is to change all #[pyproto] methods to support any of the five of &PyCell, PyRef, PyRefMut, &self and &mut self, just like we support for #[pymethods].

So what would the trait definition would be? Or we remove __dunder__ methods?

davidhewitt commented 4 years ago

I think I see a way to make a PyMethodReceiver trait which is implemented by all five types.

davidhewitt commented 4 years ago

Quick update from me: over the weekend I had a little time to play with a trait along the lines of:

pub trait PyMethodReceiver<'a>: Sized {
    fn receive<T, U>(slf: &'a PyAny, f: impl FnOnce(Self) -> T) -> PyResult<U>
    where
        T: IntoPyCallbackOutput<U>,
        U: 'static;
}

It almost works, but there's some lifetime challenges about making sure the receiver lifetime is scoped correctly and safely. For the &self and &mut self receivers, I currently have some issues:

// THIS IMPL ISN'T SAFE BECAUSE &'a C IN THE CLOSURE CAN OUTLIVE THE `PYREF` GUARD
impl<'a, C: PyClass + 'a> PyMethodReceiver<'a> for &'a C {
    fn receive<T, U>(slf: &'a PyAny, f: impl FnOnce(Self) -> T) -> PyResult<U>
    where
        T: IntoPyCallbackOutput<U>,
        U: 'static
    {
        let cell: &PyCell<C> = slf.extract()?;

        // Introduce a PyRef to hold the guard. The lifetime of this is not long enough.
        let _ref = cell.try_borrow()?;

        // XXX: Use of unsafe below is not sound; was a hack to get compilation but the lifetime inferred outlives the guard.
        f(unsafe { cell.try_borrow_unguarded()? }).convert(slf.py())
    }
}

I'm hopeful that if I continue experimenting with designs in this space I'll be able to come up with a definition which is sound and gets us what we want.

kngwyu commented 4 years ago

So you mean you're extending the current fn __dunder__(slf: Self::Receiver...) to accept &Self or &mut Self type? Interesting, but I'm not sure we really need this. Both fn __dunder__(slf: &Self) and fn __dunder__(slf: PyRef<Self>) are not straightforward to write, so I think the usablity gain is not so much.

davidhewitt commented 4 years ago

I think that the most important thing is that we support fn __dunder__(slf: PyRef<Self>), for a couple of reasons:

So for 0.13 we could change all dunder methods to use TryFromPyCell trait, even if we can't support the full set.

n1t0 commented 3 years ago

I'm also very interested in this, for another use case: PyRef<Self> gives access to all inherited classes. I don't think there is any way (even unsafe) to access them with the provided &self.

davidhewitt commented 3 years ago

I took another look at this today, with a hope that after #1328 it should be possible to do further refactoring to support all of the receiver types listed above.

The answer is that if we change the traits to have slf: Self::Receiver as the first argument, e.g. like the existing PyIterProtocol trait:

pub trait PyIterProtocol<'p>: PyClass {
    fn __iter__(slf: Self::Receiver) -> Self::Result
    where
        Self: PyIterIterProtocol<'p>,
    {
        unimplemented!()
    }

    fn __next__(slf: Self::Receiver) -> Self::Result
    where
        Self: PyIterNextProtocol<'p>,
    {
        unimplemented!()
    }
}

... then for these traits it's not possible to call self.__iter__(). It has to be called as e.g. PyIterProtocol::__iter__(self).

As a consequence, supporting &self syntax in #[pyproto] for these traits is confusing in my opinion. If I see this code:

#[pyproto]
impl PyIterProtocol for MyClass {
    fn __iter__(slf: PyRef<Self>) -> PyRef<Self>
    {
        slf
    }

    fn __next__(&mut self) -> Option<i32>
    {
        unimplemented!()
    }
}

Then I would expect I should be able to call self.__next__() on MyClass instances. But actually I can't, because the trait definition above doesn't have an &mut self receiver. I have to call it as MyClass::__next__(self).


This problem might eventually be fixed with the arbitrary_self_types feature in a far-future Rust version. In the interest of providing a solution now, I think we have two options:

  1. Just support TryFromPyCell receivers for all protocols. That's the solution dicussed elsewhere in this thread. It's breaking for all existing #[pyproto] implementations which use &self or &mut self receivers. At least all protocols will be consistent with each other after this.

  2. Merge #[pyproto] and #[pymethods]. This idea is a bit more radical, but I actually think it could be quite nice. Basically we let users write slot methods in #[pymethods] and the proc macro can detect these and handle them specially.

    Migrating for users should not be too hard because all they will have to do is merge their #[pyproto] blocks into their #[pymethods].

    Doing this would remove a lot of the current things we do to make #[pyproto] work (e.g. inserting lifetimes, lots of extra protocol traits). So the pyo3 codebase would probably be smaller and easier to maintain. From a user perspective, pyo3 would have a smaller API.

    There are some nice advantages to using traits though, like documentation and grouping (e.g. force both buffer methods to be implemented together). We can always make sure the guide has good docs.

    ... I'm tempted to hack around in the near future and see what this feels like in practice.

birkenfeld commented 3 years ago

Without knowing too much about the implementation specifics, idea #2 sounds good to me. It will feel natural to Python users, since it mirrors how special methods are defined there.

On the Rust side though, the "trait-ness" of these interfaces is lost. Do people use the traits on the Rust side to handle different objects with common traits - even if the traits only contain Python related functionality? (If desperately needed this could still be mitigated by still having the traits and implementing them automatically from the pymethods macro, with its methods just forwarding to the related pymethod.)

davidhewitt commented 3 years ago

Do people use the traits on the Rust side to handle different objects with common traits - even if the traits only contain Python related functionality?

It's possible, but I haven't seen this in practice. Note that a Rust extension module will only have these protocol traits defined for its own pyclasses, and not for any builtin or thirdparty Python object. So most consumption of the Python protocols will still need to go via Python's dynamic type system / attribute lookup at runtime.

Having drafted the implementation for option 1 in #1561, I also now strongly prefer option 2. I agree about just having #[pymethods] being natural for Python users, and the migration for existing PyO3 projects will be easier. Option 2 will require cut-and-pasting all #[pyproto] methods into #[pymethods]. Option 1 actually forces quite a significant re-write of all #[pyproto] methods.

In either way, I think release 0.14 is due soon as we've got a lot of changes piling up, so I'm regrettably going to shift this to the 0.15 milestone. In my eyes this is one of the top things to sort for 0.15, along with #1056.

kngwyu commented 3 years ago

I think idea 2 could be better for those who already know Python well, but for Rust users who don't know Python very well, the trait is much better. I actually didn't know Python very well when I had to write some Rust extensions for Python. I'm not sure it's a major case, though. How about converting user-defined trait methods to take slf: PyRef<T>? I agree that this is certainly ugly, but in the future, we can use arbitrary self.

Also, even if we remove protocol traits, I think we have to leave some traits (e.g., buffer). So, if we are going to remove protocol traits, we need to clarify what traits should be remained.

davidhewitt commented 3 years ago

How about converting user-defined trait methods to take slf: PyRef? I agree that this is certainly ugly, but in the future, we can use arbitrary self.

This is exactly what I've drafted in #1561. As well as being a bit ugly it's also a big migration for all existing code to have to change.

I think idea 2 could be better for those who already know Python well, but for Rust users who don't know Python very well, the trait is much better. I actually didn't know Python very well when I had to write some Rust extensions for Python. I'm not sure it's a major case, though.

What I think I'm hearing from this is that the traits provide useful grouping and documentation. I think with good documentation in the guide we can have most of the same benefit with #[pymethods].

kngwyu commented 3 years ago

I meant rewriting trait methods by proc macro.

davidhewitt commented 3 years ago

Ahh I see!

Yes, it's true #[pyproto] could rewrite all &self to slf: PyRef<Self>. It'd also have to rewrite all uses of self to slf inside of the function body. The result would mean that there would be less hard-breaking migration for users implementing these traits.

The downsides of this approach:

If people think that option 1 is the better final design than option 2, then I could support migrating to option 1 via this. However I still think that option 2 is probably nicer overall. I need to try and draft an implementation of option 2 and see what it looks like in reality!

davidhewitt commented 3 years ago

Another interesting point for the discussion: a user this week tried to implement __str__ and __repr__ in #[pymethods] and then was confused why they did not work correctly. I had to point them to #[pyproto] docs on Gitter: https://kushaldas.in/posts/adding-dunder-methods-to-a-python-class-written-in-rust.html

This further makes me think just having the one macro would help avoid confusion.

kngwyu commented 3 years ago

OK, now I'm also inclined to use pymethod for everything. Raising compile errors and preparing documentation would be big stuff, though.

davidhewitt commented 3 years ago

Yes, I'm willing to put in the effort to build all of the necessary implementation for 0.15!

mejrs commented 3 years ago

a user this week tried to implement __str__ and __repr__ in #[pymethods] and then was confused why they did not work

I've had this problem too.

preparing documentation

I'm on board to do this. I know my way around the magic methods (in python) pretty well.

davidhewitt commented 3 years ago

Help with the documentation when we're ready would be really amazing. I'll probably start experimenting with this in about a month's time once 0.14 release is done and any relevant bugfixes out the way.

I imagine that what this will end up being like is that the #[pymethods] which have constraints in the types and arguments will be the ones that need documentation.

For example, we might need to document constraints like:

`__setattr__` 
  - Takes a receiver (`&self`, `&mut self`, `PyRef<Self>`, `Py<Self>` etc.)
  - Takes two arguments
  - Returns `()` or `PyResult<()>`

`__str__`
  - Takes a receiver (`&self`, `&mut self`, `PyRef<Self>`, `Py<Self>` etc.)
  - Takes no arguments
  - _Should_ return a string type (`&str`, `String`, `&PyString`, `&PyAny`, `PyObject` etc.), optionally wrapped in `PyResult`. 

I imagine that it might need a couple iterations to make this all easy to read.

davidhewitt commented 3 years ago

The initial implementation for #[pymethods] is merged in #1864 and the remaining work is tracked in #1884. The plan is not to change #[pyproto] any more, and to eventually deprecate. Anyone who has a need for this functionality is encouraged to try the experimental #[pymethods] implementation in 0.15 once that releases (or on main already).