PyO3 / pyo3

Rust bindings for the Python interpreter
https://pyo3.rs
Apache License 2.0
12.39k stars 765 forks source link

Pybytes specialization slices #4442

Closed davidhewitt closed 3 months ago

davidhewitt commented 3 months ago

This is yet another attempt to optimise #4417. This time, I was able to achieve near-zero performance abstraction over the top of PyBytes::new.

The realisation was that the &[T] and Cow<[T]> implementations were dependent on having a new implementation impl IntoPyObject for &u8. If we remove that implementation, then we can instead have specialized implementations directly for &[u8] and Cow<[u8]>which don't conflict with the blanket implementation for &[T] or Cow<[T]>.

The downside of this approach is that we permanently lose the ability to implement IntoPyObject for &u8. But personally I don't see that as much of a problem; we didn't have that implementation before. This also prevents us from supporting sequences like Vec<&u8>, but again I sort of find that type weird and it doesn't bother me too much if we ask users to do something special with it themselves.

davidhewitt commented 3 months ago

Thinking about this a bit further; we already have impl ToPyObject for u8 which was approximately impl IntoPyObject for &u8, so maybe not having IntoPyObject for &u8 is more breaking than I might like.

davidhewitt commented 3 months ago

It's probably ok to go this route and not have impl IntoPyObject for &u8, but users might have to migrate code by adding a dereference to copy the &u8 into a u8.

That said, the compiler error currently doesn't suggest that. Maybe we can suggest rustc to improve the diagnostic here:

error[E0277]: `&u8` cannot be converted to a Python object
   --> test.rs:10:43
    |
 10 |             let obj = takes_into_pyobject(x);
    |                       ------------------- ^ the trait `conversion::IntoPyObject<'_>` is not implemented for `&u8`
    |                       |
    |                       required by a bound introduced by this call
    |
    = note: `IntoPyObject` is automatically implemented by the `#[pyclass]` macro
    = note: if you do not wish to have a corresponding Python type, implement it manually
    = note: if you do not own `&u8` you can perform a manual conversion to one of the types in `pyo3::types::*`
    = help: the trait `conversion::IntoPyObject<'_>` is implemented for `u8`
    = help: for that trait implementation, expected `u8`, found `&u8` 
Icxolu commented 3 months ago

Very interesting approach! I think not providing impl IntoPyObject for &u8 will have an considerable impact on the generic APIs that we can provide(, or rather how they work). For example I have started working on providing impls for references to collections like &HashSet<K> which will require &'a K: IntoPyObject<'py>, so a &HashSet<u8> will not be covered. This does not feel to uncommon to me (maybe we store a HashSet<u8> in a pyclass and access it via &self), so I'm a bit skeptical if it's a good idea to exclude &u8. I have the feeling that it will create many surprising corner cases...

Maybe we could combine this approach with #4424. That way we have the fast and efficient PyBytes::new implementation for the owning collection via AsRef<[Self]> from here and the borrowing collections will take a slight performance hit using the PyBytes::from_iterator approach from #4424 .

davidhewitt commented 3 months ago

Agreed, we need the &u8 implementation.

Maybe we could combine this approach with #4424. That way we have the fast and efficient PyBytes::new implementation for the owning collection via AsRef<[Self]> from here and the borrowing collections will take a slight performance hit using the PyBytes::from_iterator approach from #4424 .

This turned out to be a fantastic suggestion; and running with it managed to do one step further. I had to split the private method into two, owned_sequence_into_pyobject (used by Vec<u8>) and borrowed_sequence_into_pyobject (used by &[u8]), and needed a helper trait Reference to describe the slice in borrowed_sequence_into_pyobject as AsRef<[<Self as Reference>::BaseType]>.

That lets me restore impl IntoPyObject for &u8 while also managing to specialize slices on &[u8].

Now I think we have all the implementations we want, and everything safely uses PyBytes::new! 🎉🤯

This has consumed far too much of my spare brain cycles in the last week, but it's been fun 😂

If you're happy with how this looks, I think this might now be good to merge?