capi-workgroup / api-evolution

Issues related to evolution of the C API
14 stars 1 forks source link

For output parameters passed by pointer, set them on error: avoid uninitialized values #32

Open vstinner opened 11 months ago

vstinner commented 11 months ago

Python has a bunch of C API functions which take parameters are pointers, such as PyUnicode_AsUTF8AndSize(obj, &size). Some of these functions don't set output parameters on error. Problem: usually, these functions are called with variables allocated on the stack without being initialized, and so the variable value is undefined :-(

I propose to set a new guideline: always initialize all output parameters to known values.

For example, https://github.com/python/cpython/pull/111106 sets the size of -1 on error. Previously, the size argument was left unchanged. Extract of the unit test fix:

    Py_ssize_t size = UNINITIALIZED_SIZE;
    const char *s = PyUnicode_AsUTF8AndSize(unicode, &size);
    if (s == NULL) {
-        assert(size == UNINITIALIZED_SIZE);
+        assert(size == -1);
        return NULL;
    }

Now size value is determistic and there is no undefined behavior anymore.

Previously, using size in the error code could lead to undefined behavior. Example:

Py_ssize_t size;
const char *str = PyUnicode_AsUTF8AndSize(unicode, &size);
return PyUnicode_FromString(str, size);

What happens if PyUnicode_AsUTF8AndSize() fails and size is undefined?

cc @erlend-aasland @serhiy-storchaka

serhiy-storchaka commented 11 months ago

What happens if PyUnicode_AsUTF8AndSize() fails and size is undefined?

SystemError, unless size is 0. Empty string if size is 0.

See also python/cpython#110865. *end is set in case of some errors and is not set in case of other errors.

I am not sure that such guideline should be applied in all cases. In some cases it may be handy to left the value unchanged. For example, PyArg_ParseTuple() leaves values for optional arguments unchanged, it allows to specify default values which can be different in every case and even depend on the state of the object.

zooba commented 11 months ago

Previously, using size in the error code could lead to undefined behavior. Example:

Py_ssize_t size;
const char *str = PyUnicode_AsUTF8AndSize(unicode, &size);
return PyUnicode_FromString(str, size);

This construct should produce a compiler warning already.

In some cases it may be handy to left the value unchanged.

I agree. We should definitely have a rule to document the value left in a reference parameter in all situations (e.g. " on success; unchanged on error" or "set to 0 on error") and be correct within that function. Being more strict about it shouldn't be necessary.

We should be able to assume that callers do their error handling properly. Otherwise, we're going to end up with more checks in every function than are worth it (e.g. people want to skip type checks, which we can only do if we trust the caller).

vstinner commented 11 months ago

This construct should produce a compiler warning already.

Not all compilers are able to inspect PyUnicode_AsUTF8AndSize() implementation to see if size is always initialized or not. While it should be the case in CPython with LTO, it's way more complicated for C extensions which are not even linked to libpython anymore.

zooba commented 11 months ago

Yeah, looks like the major ones all seem to skip their uninitialised local warnings when you've passed its address somewhere. Guess it's a common enough initialisation pattern that it's worth the heuristic.

I still think we can assume that callers do their own error handling.

vstinner commented 1 month ago

Functions added to the C API now initialize parameters passed as pointers on error to avoid uninitialized state. I consider that the issue has been fixed.

On the other side, I don't think that it's worth it to go through existing functions.

encukou commented 1 month ago

This is not yet in the guideline PEP draft.