encukou / abi3

Improvements of Python's stable ABI
BSD 3-Clause "New" or "Revised" License
8 stars 1 forks source link

Do not use void* for function pointers #14

Open encukou opened 3 years ago

encukou commented 3 years ago

In strict standard C, we need to distinguish data pointers from function pointers. On some architectures, data and instruction pointers have different sizes, meaning data/function pointers are actually incompatible.

One idea how to do it for PyType_Slot w/o breaking backwards compatibility is to specify non-function entries through PyMemberDef, as we did for __dictoffset__, __weaklistoffset__ and __vectorcalloffset__ (though all those are just numbers, pointers aren't possible yet). This style could be added as an alternative way to specify the existing non-function slots, and then the corresponding slots can be deprecated.

There are probably also other places where this happens in the C API; those can be listed here.

encukou commented 3 years ago

@vstinner, btw this is one more thing to keep in mind for C API improvements

vstinner commented 3 years ago

HPy is working on a fix: https://github.com/hpyproject/hpy/pull/23

encukou commented 3 years ago

This is included in the devguide. Existing API will need replacements & deprecations.

encukou commented 3 years ago

@hodgestar, Are you, by chance, aware of a compiler/compiler flag that will issue warnings/errors for this problem (and ideally, can compile CPython)? So far I only know about Emscripten.

vstinner commented 3 years ago

So far I only know about Emscripten.

https://emscripten.org/docs/porting/guidelines/function_pointer_issues.html issue seems to related to cast between different function pointer types, not about cast between void* and a function pointer type.

But this Emscripten issue reminds me clang CFI https://clang.llvm.org/docs/ControlFlowIntegrity.html which also checks a function pointer type at runtime for security before calling a function. I understood that you call only call func1 function if it has func1 function type. There is a mapping between a function pointer and its type, and then the type is checked at runtime. At least, it was what I understood in https://lwn.net/Articles/856514/ :-)

In Python, we have such issue with function pointers used to define module methods. A module method has a type, but many C extensions ignore the type and uses a different type. Example with atexitmodule.c:

static PyObject *
atexit_register(PyObject *module, PyObject *args, PyObject *kwargs)
{
    ...
}

static PyMethodDef atexit_methods[] = {
    {"register", (PyCFunction)(void(*)(void)) atexit_register, METH_VARARGS|METH_KEYWORDS,
        atexit_register__doc__},
    ...
};

whereas the proper type is:

typedef PyObject *(*PyCFunction)(PyObject *, PyObject *);

The problem is that METH_KEYWORDS adds a 3rd parameter which doesn't exist in PyCFunction. Now enjoy (*(PyCFunctionWithKeywords)(void(*)(void))meth) cast in methodobject.c :-)

static PyObject *
cfunction_call(PyObject *func, PyObject *args, PyObject *kwargs)
{
    ...
    PyCFunction meth = PyCFunction_GET_FUNCTION(func);
    PyObject *self = PyCFunction_GET_SELF(func);

    PyObject *result;
    if (flags & METH_KEYWORDS) {
        result = (*(PyCFunctionWithKeywords)(void(*)(void))meth)(self, args, kwargs);
    }
    else {
        ...
    }
    ...
}

I don't know if these casts are compatible or not with clang CFI.

cc @serge-sans-paille: Do you know if CFI allows to call a function if we take a function pointer of a first function pointer type (PyCFunction), cast it to void(*)(void) function type (cast 1), then cast it again to a 3rd function pointer type (PyCFunctionWithKeywords) (cast 2)?

serge-sans-paille commented 3 years ago

The cast between different pointer types is allowed in C, but the actual usage must respect the original type signature, otherwise you're UB. and CFI (as well as UB sanitizer) catches that issue.

Storing a function pointer with an incorrect type is fine. Using it is not.