Discussion: Why pyo3_ffi vs pyo3

Dr-Emann commented 4 months ago

Just wondering what the reasoning is for using the lower level (and much more unsafe) pyo3_ffi bindings directly rather than the nicer pyo3 wrapper on top?

ariebovenberg commented 4 months ago

PyO3 is indeed a great tool! The main reason for choosing FFI was to have performance-parity with datetime on even the "smallest" methods. At the time, PyO3 function overhead was significant. I recall small datetime methods (like __eq__) taking about ±10ns, while the PyO3 method overhead itself was already 40-50ns. I've heard PyO3's performance has improved somewhat, so I'll probably do an updated benchmark.

I also gravitated to PyO3_ffi to further my understanding of Python's underlying C API. This helps me understand the how and why of PyO3 as well—if I would adopt it in the future (or even contribute)

Of course you're right that it also comes with downsides, mainly safety. However, the methods are so small that things like refcounting are reasonably simple to track. Larger logic such as parsing occurs in safe rust code.

Other reasons I'm not yet in a rush to adopt PyO3:

PyO3's API is still stabilizing
The PyO3 wrapper isn't (yet) compatible the subinterpreters and has issues with free-threaded Python.

ariebovenberg commented 4 months ago

An updated benchmark on __eq__:

$ python -m timeit -s "from datetime import datetime, UTC; d1 = datetime.now(); d2 = datetime.now()" "d1 == d2"
20000000 loops, best of 5: 11.1 nsec per loop
$ python -m timeit -s "from whenever import Instant; d1 = Instant.now(); d2 = Instant.now()" "d1 == d2"
20000000 loops, best of 5: 11.5 nsec per loop

using this PyO3 definition added to the example maturin_starter:

    pub fn __richcmp__(&self, value: &Bound<'_, PyAny>, _x: pyo3::basic::CompareOp) -> bool {
        true
    }

(note: I did remember to build in --release mode)

$ python -m timeit -s "from maturin_starter.submodule import SubmoduleClass; f = SubmoduleClass(); f2 = SubmoduleClass()" "f == f2"
20000000 loops, best of 5: 17.7 nsec per loop

This looks a lot better, but note that this is the performance without the actual implementation. We'd still need to add the type checking and comparing the actual value. It's better than the 40ns observed in earlier PyO3 versions though. Could be worth doing a benchmark on the full logic 🤔

ariebovenberg commented 4 months ago

@Dr-Emann I've added this to the FAQ. PyO3/ffi discussion can always be opened later as PyO3 evolves further.

ariebovenberg / whenever

Discussion: Why pyo3_ffi vs pyo3 #148