Open davidhewitt opened 1 month ago
Hmm, can you say more about (3)?
On Fri, Sep 27, 2024 at 11:50 AM David Hewitt @.***> wrote:
PyErr contains some arcane trickery internally to avoid creating a Python exception object, by making some "lazy" state which boxes a PyErrArguments object.
While the original design predates me, I believe that it was done for performance to avoid creating a Python exception object inside of pure-Rust code.
I think there might be a few reasons to revisit this design.
- I think the performance motivation is not clear-cut; it makes PyO3's error handling relatively expensive because every error pathway has to box up the state into a dyn Trait before then creating a Python exception.
- I think we've learned that a better design is to allow APIs to be generic on the error type where relevant, i.e. in IntoPyObject (and I guess we could do the same in FromPyObject). The proc macros have for a long time allowed any error type which is convertible to PyErr, too.
- I realize as I write this issue that PyErrState::normalize is not sound under the freethreaded build: it uses the GIL for synchronization.
- The boxed PyErrArguments state is not traversible by the Python GC. In theory this means that the state can contain arbitrary Python objects, could therefore participate cycles and cause memory leaks. We would need to make breaking changes to the trait to fix this.
I propose that instead we remove the ability for PyErr to be lazy and make it a thin wrapper around Py
. This is essentially the direction that Python went in 3.12 for the global SetRaisedException ffi calls. It would have the effect of creating a PyErr to change performance profile; instead of creating a boxed lazy exception state it would immediately create the Python object. This might be faster, it might be slower. But I think it would simplify things which might overall improve things.
Due to point 3, i.e. that it's not threadsafe, I'm tempted to just try to land this ASAP on 0.23. cc @ngoldbaum https://github.com/ngoldbaum
I could at least make a branch and see what the benchmarks say.
— Reply to this email directly, view it on GitHub https://github.com/PyO3/pyo3/issues/4584, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBHKOTZPQ2TEABBWC3LZYV5FFAVCNFSM6AAAAABO7N27JKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU2TGMRVGMYDINI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- All that is necessary for evil to succeed is for good people to do nothing.
The state
is contained in an UnsafeCell
, and we don't try to synchronize writes and reads because we require py: Python
to e.g. PyErr::value
Ah, yes, sorry the unsoundness is really in make_normalized
, not normalize
.
This whole thing is really just a GILOnceCell, right?
Not exactly, it consumes it's original state and rebuilds it in place when normalizing. You're right it's write once, though. I think the in place rebuild implies a lock if we wanted to keep the current design.
I realize that one option is to leave the design as is and just replace the UnsafeCell
with a mutex on the freethreaded build. But I think that might have unsatisfying perf consequences and maybe (slim) possibility of deadlocks.
That might be good enough for 0.23, however I still feel removing this lazy behaviour might be the correct long term option.
I think we've learned that a better design is to allow APIs to be generic on the error type where relevant, i.e. in IntoPyObject (and I guess we could do the same in FromPyObject). The proc macros have for a long time allowed any error type which is convertible to PyErr, too.
however I still feel removing this lazy behaviour might be the correct long term option.
I'm considering whether we should just have two error types, one being the cheap one and the other holding the python exception object (Maybe we could represent this with a generic parameter). We can probably keep the breaking changes to a minimum by having the default parameter be the uninstantiated variant and having most of the conversions happen inside a proc macro.
I'm open to this, can you maybe elaborate a bit further? I think you're right that ripping the lazy state out is maybe more breaking than I originally considered, so we should think more carefully.
I think based on @alex's observation that this is basically a GILOnceCell
, I have a sketch of an idea to use a Once
to block on the freethreaded build to keep things sound. I'll try to play with that soon, and maybe that's better to land to unblock 0.23 without changing existing semantics on the default builds.
PyErr
contains some arcane trickery internally to avoid creating a Python exception object, by making some "lazy" state which boxes aPyErrArguments
object.While the original design predates me, I believe that it was done for performance to avoid creating a Python exception object inside of pure-Rust code.
I think there might be a few reasons to revisit this design.
dyn Trait
before then creating a Python exception.IntoPyObject
(and I guess we could do the same inFromPyObject
). The proc macros have for a long time allowed any error type which is convertible to PyErr, too.PyErrState::normalize
is not sound under the freethreaded build: it uses the GIL for synchronization.PyErrArguments
state is not traversible by the Python GC. In theory this means that the state can contain arbitrary Python objects, could therefore participate cycles and cause memory leaks. We would need to make breaking changes to the trait to fix this.I propose that instead we remove the ability for
PyErr
to be lazy and make it a thin wrapper aroundPy<PyBaseException>
. This is essentially the direction that Python went in 3.12 for the globalSetRaisedException
ffi calls.It would have the effect of creating a
PyErr
to change performance profile; instead of creating a boxed lazy exception state it would immediately create the Python object. This might be faster, it might be slower. But I think it would simplify things which might overall improve things.Due to point 3, i.e. that it's not threadsafe, I'm tempted to just try to land this ASAP on 0.23. cc @ngoldbaum
I could at least make a branch and see what the benchmarks say.