PyO3 / pyo3

Rust bindings for the Python interpreter
https://pyo3.rs
Apache License 2.0
12.39k stars 765 forks source link

"lazy" state of `PyErr` ... has challenges #4584

Open davidhewitt opened 1 month ago

davidhewitt commented 1 month ago

PyErr contains some arcane trickery internally to avoid creating a Python exception object, by making some "lazy" state which boxes a PyErrArguments object.

While the original design predates me, I believe that it was done for performance to avoid creating a Python exception object inside of pure-Rust code.

I think there might be a few reasons to revisit this design.

  1. I think the performance motivation is not clear-cut; it makes PyO3's error handling relatively expensive because every error pathway has to box up the state into a dyn Trait before then creating a Python exception.
  2. I think we've learned that a better design is to allow APIs to be generic on the error type where relevant, i.e. in IntoPyObject (and I guess we could do the same in FromPyObject). The proc macros have for a long time allowed any error type which is convertible to PyErr, too.
  3. I realize as I write this issue that PyErrState::normalize is not sound under the freethreaded build: it uses the GIL for synchronization.
  4. The boxed PyErrArguments state is not traversible by the Python GC. In theory this means that the state can contain arbitrary Python objects, could therefore participate cycles and cause memory leaks. We would need to make breaking changes to the trait to fix this.

I propose that instead we remove the ability for PyErr to be lazy and make it a thin wrapper around Py<PyBaseException>. This is essentially the direction that Python went in 3.12 for the global SetRaisedException ffi calls.

It would have the effect of creating a PyErr to change performance profile; instead of creating a boxed lazy exception state it would immediately create the Python object. This might be faster, it might be slower. But I think it would simplify things which might overall improve things.

Due to point 3, i.e. that it's not threadsafe, I'm tempted to just try to land this ASAP on 0.23. cc @ngoldbaum

I could at least make a branch and see what the benchmarks say.

alex commented 1 month ago

Hmm, can you say more about (3)?

On Fri, Sep 27, 2024 at 11:50 AM David Hewitt @.***> wrote:

PyErr contains some arcane trickery internally to avoid creating a Python exception object, by making some "lazy" state which boxes a PyErrArguments object.

While the original design predates me, I believe that it was done for performance to avoid creating a Python exception object inside of pure-Rust code.

I think there might be a few reasons to revisit this design.

  1. I think the performance motivation is not clear-cut; it makes PyO3's error handling relatively expensive because every error pathway has to box up the state into a dyn Trait before then creating a Python exception.
  2. I think we've learned that a better design is to allow APIs to be generic on the error type where relevant, i.e. in IntoPyObject (and I guess we could do the same in FromPyObject). The proc macros have for a long time allowed any error type which is convertible to PyErr, too.
  3. I realize as I write this issue that PyErrState::normalize is not sound under the freethreaded build: it uses the GIL for synchronization.
  4. The boxed PyErrArguments state is not traversible by the Python GC. In theory this means that the state can contain arbitrary Python objects, could therefore participate cycles and cause memory leaks. We would need to make breaking changes to the trait to fix this.

I propose that instead we remove the ability for PyErr to be lazy and make it a thin wrapper around Py. This is essentially the direction that Python went in 3.12 for the global SetRaisedException ffi calls.

It would have the effect of creating a PyErr to change performance profile; instead of creating a boxed lazy exception state it would immediately create the Python object. This might be faster, it might be slower. But I think it would simplify things which might overall improve things.

Due to point 3, i.e. that it's not threadsafe, I'm tempted to just try to land this ASAP on 0.23. cc @ngoldbaum https://github.com/ngoldbaum

I could at least make a branch and see what the benchmarks say.

— Reply to this email directly, view it on GitHub https://github.com/PyO3/pyo3/issues/4584, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBHKOTZPQ2TEABBWC3LZYV5FFAVCNFSM6AAAAABO7N27JKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU2TGMRVGMYDINI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- All that is necessary for evil to succeed is for good people to do nothing.

davidhewitt commented 1 month ago

The state is contained in an UnsafeCell, and we don't try to synchronize writes and reads because we require py: Python to e.g. PyErr::value

alex commented 1 month ago

Ah, yes, sorry the unsoundness is really in make_normalized, not normalize.

alex commented 1 month ago

This whole thing is really just a GILOnceCell, right?

davidhewitt commented 1 month ago

Not exactly, it consumes it's original state and rebuilds it in place when normalizing. You're right it's write once, though. I think the in place rebuild implies a lock if we wanted to keep the current design.

davidhewitt commented 1 month ago

I realize that one option is to leave the design as is and just replace the UnsafeCell with a mutex on the freethreaded build. But I think that might have unsatisfying perf consequences and maybe (slim) possibility of deadlocks.

That might be good enough for 0.23, however I still feel removing this lazy behaviour might be the correct long term option.

mejrs commented 1 month ago

I think we've learned that a better design is to allow APIs to be generic on the error type where relevant, i.e. in IntoPyObject (and I guess we could do the same in FromPyObject). The proc macros have for a long time allowed any error type which is convertible to PyErr, too.

however I still feel removing this lazy behaviour might be the correct long term option.

I'm considering whether we should just have two error types, one being the cheap one and the other holding the python exception object (Maybe we could represent this with a generic parameter). We can probably keep the breaking changes to a minimum by having the default parameter be the uninstantiated variant and having most of the conversions happen inside a proc macro.

davidhewitt commented 1 month ago

I'm open to this, can you maybe elaborate a bit further? I think you're right that ripping the lazy state out is maybe more breaking than I originally considered, so we should think more carefully.

I think based on @alex's observation that this is basically a GILOnceCell, I have a sketch of an idea to use a Once to block on the freethreaded build to keep things sound. I'll try to play with that soon, and maybe that's better to land to unblock 0.23 without changing existing semantics on the default builds.