Open gerrymanoim opened 2 weeks ago
When using pyo3(get)
on a primitive, it will need to be turned into a Python object at each attribute access -- that is to say, there's a heap allocation for a new PyLong
object each time.
A strictly apples-to-apples comparison would be a Py<SomeType>
attribute.
Hmm I appreciate the suggestion but can't seem to get that to behave any differently.
I now have
#[pyclass]
struct RustClass {
#[pyo3(get)]
value: i32,
#[pyo3(get)]
pyvalue: PyObject,
}
and get pretty much the same result:
>>> bench()
Python:
0.057306197995785624
Rust value:
0.6657147670048289
Rust pyvalue:
0.6800711160176434
Perhaps I'm not understanding your suggestion correctly?
I attempted to reproduce this locally, and I find that the difference is much smaller than you're finding. Did you compile with --release
?
$ python test.py
Python:
0.02138412299996162
Rust:
0.04930254800001421
PyO3 0.22 is expected to be faster still, I get numbers more like this:
$ python test.py
Python:
0.021781480999948144
Rust:
0.040761033000080715
Hmm I appreciate the suggestion but can't seem to get that to behave any differently.
I find the same as you; I think this is because constructing a Python zero from a Rust zero is also extremely cheap. But I still find the slowdown to be only 2x rather than 12x.
I'm going to do a quick bit of profiling and see if I can understand and close the gap...
Ok, so I did some experimentation and if I teach Python to read directly from a Py<PyAny>
value using an ffi::PyMemberDef
definition, then I get (unsurprisingly) performance which matches the Python lookup:
$ python test.py
Python:
0.02066981300004045
Rust:
0.03565813099976367
Rust (Py<PyAny>):
0.02015118600002097
I did the same trick for the Rust i32
, but it's still slower than Python because it needs to create the int object. That's not a huge surprise and a natural cost of crossing the boundary.
I also think that this provides us a nice solution to the lack of Py::clone
without the py-clone
feature, so I'm going to see if I can turn this into a real implementation immediately. Between perf optimization and a usability improvement, seems like a clear win to get into 0.22 quickly.
I think we cannot apply the same optimization for #[pyo3(set)]
because we have the borrow-checking to guard against write aliasing.
Ah - you're totally right - I wasn't passing --release
.
>>> bench()
Python:
0.06525319800130092
Python property:
0.08756525401258841
Python fn:
0.08051640499616042
Rust value:
0.1070495129970368
Rust pyvalue:
0.11770369400619529
I'm much closer now.
Can you show me how you're using ffi::PyMemberDef
- that's exactly what I'm looking for I think.
Appreciate all the help!
So PyMemberDef
only works for a very specific set of types and only for read-only fields due to Rust's guarantees around mutable aliasing.
We don't really have a point in the #[pyclass]
machinery where you can hook one in. Given the safety risks, I'd rather not make one either. Instead I'm working on a PR which would automatically choose it as the implementation when possible. I'm going to push this today.
I have the following setup:
and
When I run bench I find that the rust attribute access is much slower than the python direct access.
Its unclear whether this is intended or not. Is there a good way for me to replicate the performance of python here?
If its useful: