capi-workgroup / problems

Discussions about problems with the current C Api
19 stars 6 forks source link

Different users need different stability trade-offs #9

Open encukou opened 1 year ago

encukou commented 1 year ago

This is currently solved by tiers, but: Different users need different stability expectations:

Of course, it's a spectrum.

markshannon commented 1 year ago

Not just different users, sometimes the same user in different circumstances. Any of the above can apply to the same person at different times.

encukou commented 1 year ago

s/users/personas/

gvanrossum commented 1 year ago

Although #4 was controversial, the data there seems to say that not that many personas/packages care about low maintenance? Or perhaps the distinction is about code not breaking but requiring recompilation is fine, vs. code breaking when just recompiling for a new Python version. I think most people who manually maintain code strongly prefer the latter -- recompiling is pretty mechanical, assuming it works. When using code generators like Cython, presumably upgrading to a new version of Cython and then recompiling also isn't much a burden.

The real problems occur when recompilation doesn't work, because then you have to thaw out a maintainer who has to read and understand the code and think about a fix, test it, etcetera.

As long as recompiling (after updating dependencies) works, we can meet the needs of packages that currently use the Stable ABI in a different way, e.g. using public build farms.

encukou commented 1 year ago

I agree that API stability is more important than ABI stability. But for a different reason: with a properly designed API, a stable ABI isn't that hard to provide.

(As discussed in #4, there's one known major issue with Python's current stable ABI -- one bad promise we made in Python 3.2. That's not a bad track record for a feature.)

steve-s commented 1 year ago

Or perhaps the distinction is about code not breaking but requiring recompilation is fine, vs. code breaking when just recompiling for a new Python version. I think most people who manually maintain code strongly prefer the latter -- recompiling is pretty mechanical, assuming it works.

Assuming it works can be a strong assumption, I think.

Compiling an extension may not be that trivial if it depends on system libraries and some compiler/OS/whatnot specific setup. For example if it uses a different language (rust, shiny new C++ standard, NIM, Fortran,...). Take as an example the effort to compile the scientific Python extensions with LLVM toolchain such that it can be compiled to web assembly. That was not "just recompiling" with a different toolchain. Another example: Tensorflow, it is not trivial to compile it from sources.

Additionally, the challenges extend beyond Python packages available on PyPI. Similar issues are likely to arise within internal code-bases. For instance, imagine a scenario where a company relies on a package developed, compiled, and distributed by an employee who then leaves the company. I think there were similar issues with the Python 2->3 change. Some found out that they've been running Python 2 code-base, but lost the knowledge to be able to port it to 3. While this is primarily an issue for the company to address, it's worth considering in the broader context of the Python ecosystem.

Furthermore, alternative Python implementations should be taken into account. Should we design an API that requires recompilation even when targeting a different Python implementation? This would add another dimension to the matrix of artifacts that need to be built, and also you probably going to need said alternative implementation to build the package for it -> more dependencies, more things to set up.

So still I think it depends a lot on the package and its uses, and also from user perspective not all packages are equally important. Imagine an important (i.e., popular, used by lots of projects) package that provides binding to some system library, GPU library, etc. The performance of the native part of that package that uses C API is probably negligible, because the total runtime is going to be absolutely dominated by the actual computation in the native code oblivious to Python. The required API surface for such packages tends to be relatively small. For both maintainers and users of such packages, it would be ideal if there was a single artifact per OS/architecture that could seamlessly run on any Python version, offering simplicity and ease of distribution/use.

encukou commented 1 year ago

Should we design an API that requires recompilation even when targeting a different Python implementation?

I know you know, but, this already exists: it's HPy. There's enough need for this that people spun up a project for it. CPython doesn't necessarily need to provide such API/ABI itself. But it at least needs to allow it to be built.

hodgestar commented 1 year ago

My own use case for ABI compatibility is very straight forward. I am one of the maintainers of QuTiP. The large majority of our users install pre-built binary packages, either wheels from PyPI or packages from conda-forge. These users have no ability to compile packages on their own.

So when a new Python version is released, there are no packages most of QuTiP users can install. Then I wait for numpy to build packages. And then for SciPy to build packages. And then I do a new QuTiP release that uses all of those. Currently that process takes a month or two and during that time QuTiP users can't install QuTiP on the latest Python.

Everyone has lived with this state of affairs for years, so it's certainly on the end of the world, but it would be nice to not have this gap and an ABI compatible between versions would provide that.

The stable ABI in theory addressed this problem, but it's not widely used and quite hard to keep stable. For example, Python can never support two versions of the stable ABI at once, so transitions are hard.