PyO3 / pyo3

Rust bindings for the Python interpreter
https://pyo3.rs
Other
11.44k stars 693 forks source link

#[pyo3(get)] much slower than python attribute access #4247

Open gerrymanoim opened 2 weeks ago

gerrymanoim commented 2 weeks ago

I have the following setup:

// lib.rs
use pyo3::prelude::*;

#[pyclass]
struct RustClass {
    #[pyo3(get)]
    value: i32
}

#[pymethods]
impl RustClass {
    #[new]
    fn new() -> Self {
        RustClass{value: 0}
    }
}

/// A Python module implemented in Rust.
#[pymodule]
fn _lowlevel(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_class::<RustClass>()?;
    Ok(())
}

and

import timeit

from pyo3_bench._lowlevel import RustClass

class PyClass:
    def __init__(self):
        self.value = 0

def bench():
    py = PyClass()
    rust = RustClass()
    print("Python:")
    print(timeit.timeit(lambda: py.value, number=1000000))
    print("Rust:")
    print(timeit.timeit(lambda: rust.value, number=1000000))

When I run bench I find that the rust attribute access is much slower than the python direct access.

>>> bench()
Python:
0.06078508700011298
Rust:
0.6096129899960943
>>> bench()
Python:
0.05267535999882966
Rust:
0.6028135990200099

Its unclear whether this is intended or not. Is there a good way for me to replicate the performance of python here?

If its useful:

alex commented 2 weeks ago

When using pyo3(get) on a primitive, it will need to be turned into a Python object at each attribute access -- that is to say, there's a heap allocation for a new PyLong object each time.

A strictly apples-to-apples comparison would be a Py<SomeType> attribute.

gerrymanoim commented 2 weeks ago

Hmm I appreciate the suggestion but can't seem to get that to behave any differently.

I now have

#[pyclass]
struct RustClass {
    #[pyo3(get)]
    value: i32,

    #[pyo3(get)]
    pyvalue: PyObject,
}

and get pretty much the same result:

>>> bench()
Python:
0.057306197995785624
Rust value:
0.6657147670048289
Rust pyvalue:
0.6800711160176434

Perhaps I'm not understanding your suggestion correctly?

davidhewitt commented 2 weeks ago

I attempted to reproduce this locally, and I find that the difference is much smaller than you're finding. Did you compile with --release?

$ python test.py
Python:
0.02138412299996162
Rust:
0.04930254800001421

PyO3 0.22 is expected to be faster still, I get numbers more like this:

$ python test.py
Python:
0.021781480999948144
Rust:
0.040761033000080715

Hmm I appreciate the suggestion but can't seem to get that to behave any differently.

I find the same as you; I think this is because constructing a Python zero from a Rust zero is also extremely cheap. But I still find the slowdown to be only 2x rather than 12x.

I'm going to do a quick bit of profiling and see if I can understand and close the gap...

davidhewitt commented 2 weeks ago

Ok, so I did some experimentation and if I teach Python to read directly from a Py<PyAny> value using an ffi::PyMemberDef definition, then I get (unsurprisingly) performance which matches the Python lookup:

$ python test.py
Python:
0.02066981300004045
Rust:
0.03565813099976367
Rust (Py<PyAny>):
0.02015118600002097

I did the same trick for the Rust i32, but it's still slower than Python because it needs to create the int object. That's not a huge surprise and a natural cost of crossing the boundary.

I also think that this provides us a nice solution to the lack of Py::clone without the py-clone feature, so I'm going to see if I can turn this into a real implementation immediately. Between perf optimization and a usability improvement, seems like a clear win to get into 0.22 quickly.

I think we cannot apply the same optimization for #[pyo3(set)] because we have the borrow-checking to guard against write aliasing.

gerrymanoim commented 2 weeks ago

Ah - you're totally right - I wasn't passing --release.

>>> bench()
Python:
0.06525319800130092
Python property:
0.08756525401258841
Python fn:
0.08051640499616042
Rust value:
0.1070495129970368
Rust pyvalue:
0.11770369400619529

I'm much closer now.

Can you show me how you're using ffi::PyMemberDef - that's exactly what I'm looking for I think.

Appreciate all the help!

davidhewitt commented 2 weeks ago

So PyMemberDef only works for a very specific set of types and only for read-only fields due to Rust's guarantees around mutable aliasing.

We don't really have a point in the #[pyclass] machinery where you can hook one in. Given the safety risks, I'd rather not make one either. Instead I'm working on a PR which would automatically choose it as the implementation when possible. I'm going to push this today.