faster-cpython / ideas

1.67k stars 49 forks source link

Make CPython robust with respect to builtin function errors, and maybe a little faster. #552

Open markshannon opened 1 year ago

markshannon commented 1 year ago

The problem

Every time we call a builtin function, or make any call using the vectorcall protocol we need to sanity check the exception state. Not only that, but the checking is inconsistent. Builtin functions are checked but operations defined by type slots are not.

The problem is that we cannot trust third-party code not to mess up. It is also good to sanity check our own code.

Checking that an exception has not been set on every return is needlessly expensive.

There are four possible values/states that a builtin function can return:

Exception Return non-NULL Return NULL
Set INVALID Valid
Not set Valid INVALID -- Cheap to check

If a function returns NULL we need to handle the error, so checking to see if the error is set just adds a bit to an already expensive operation. For builtin functions that return NULL, but fail to set an exception, we just create the same error as we do now.

The problem is when a function returns non-NULL and sets the exception.

Solutions

We should make the VM and C-API able to handle having the current exception set.

We can do this by discarding the idea of a "current exception", and viewing it as a side-channel for communicating the exception, iff a function returns ERROR (aka -1).

The additional cost to doing this is that functions that may fail (as opposed to error) need to clear the exception every time they are call. For most API functions, nothing changes.

Changing the C API.

We expect that C extension code should not make spurious calls to PyErr_Ocurred(), but may call PyErr_Ocurred() only if it needs to.

Functions that return a value that can be a success or an error (or a failure), e.g. PyLongAsLong() are a problem as they force the caller to check PyErr_Ocurred() which mean all such functions will need to clear the current exception when failing. This may be a significant cost for some functions, but we should probably replace these functions anyway, as having to call PyErr_Ocurred() to check for an error is not good API design.

Some functions should not be called with the current exception set, because it will be discarded. Nothing need change there, we just need to make sure that those functions ignore the current exception.

Some functions change their behavior depending on whether an exception is set. These may need to change, but there are mostly (all?) internal.

Debugging

We will continue to check for consistent return values in debug builds, and should add extra checking for slots. We can also add more C API tests that incorrectly set the exception.