bytecodealliance / wasmtime-py

Python WebAssembly runtime powered by Wasmtime
https://bytecodealliance.github.io/wasmtime-py/
Apache License 2.0
381 stars 52 forks source link

Backward compatibility contract for bindings generated by `wasmtime.bindgen`? #219

Closed whitequark closed 3 months ago

whitequark commented 3 months ago

What is the compatibility contract for bindings generated by wasmtime.bindgen? Put differently, can I put the generated files into a wheel file or not?

If bindings generated with wasmtime-py X.0.0 are going to work on wasmtime-py (X+1).0.0: I can, and the deployment process of downstream code will consist of pre-generating bindings on CI and then distributing the wheel with a wasmtime>=X version constraint.

If bindings generated with wasmtime-py X.0.0 will only work on wasmtime-py X.0.0: I can not, and I have several options, all of which are very unpleasant:

  1. I can distribute the wheel with a wasmtime==X version constraint. This is obviously correct, but if this wheel needs to be co-installable with anything else that uses wasmtime, it is not possible to use the track-pypi-dependency-version script the README currently recommends, because it means that upgrading a completely unrelated package that bumps its lower bound on wasmtime may force an upgrade of every bindgen-using package. This is going to rapidly become impractical once you have, potentially, as few as two bindgen-using packages from different publishers.
  2. I can distribute the wheel with a wasmtime>=Y,wasmtime<X+1 version constraint as recommended in the README currently, only include the Wasm component in it, run wasmtime.bindgen on first use, and cache the generated files. This is very onerous on the consumer (every consumer would have to come up with a bespoke caching scheme; it's tempting to try and write to the site-packages where your own package is installed but it may not be wirtable; keeping the caching code in sync is burdensome in itself even if you factor it out in a separate package).
  3. I can distribute the wheel with a wasmtime>=Y,wasmtime<X+1 version constraint, and generate bindings for each wasmtime version from Y to X inclusive. This will be fairly painful to do on CI (in practice it requires a single virtual environment per wasmtime version, potentially a lockfile per wasmtime version, and some custom tooling to collect all the generated files) and result in rapid binary bloat unless carefully managed (at 14M core module size it won't be long until this wrapper for a 300 kB JS library will become the biggest PyPI package installed on the system).
  4. I can distribute the wheel with a wasmtime>0 version constraint. I mean, I personally would, but I think most consumers of the wasmtime.bindgen interface have neither time nor inclination to build complicated caching and versioning schemes, and they will simply accept a status quo of stochastic runtime breakage after wasmtime releases. I think this is doing a disservice to the ecosystem.
alexcrichton commented 3 months ago

Oh dear these are hard questions, but good ones!

I think though that this can probably boil down to the answer of whether the wasmtime module is itself stable. The component bindings are all built on top of the "core" support, so they should work indefinitely so long as the "core" support itself is stable.

Given that though I fear we're not really in a position to guarantee stability here "to the end of time" yet. I only sort of maintain this package passively myself and I'm by no means a Python expert, so I wouldn't want to consider everything "done" by any means.

As a possible alternative, though, could the bindgen step be run at load-time? That way you could run bindings generation with whatever verison of Wasmtime is linked against and that'd be dynamically loaded then by Python as well. In theory the bindings generation step should be pretty fast, but I also realize that it's probably non-kosher doing so much work at load-time.

whitequark commented 3 months ago

As a possible alternative, though, could the bindgen step be run at load-time?

That's the alternative (3), and it does solve the problem from the perspective of matching up bindings to the wasmtime version, but it involves building a lot of infrastructure that's fairly laborous to design and maintain, since there aren't many good ways to do it, and the ones that exist are finicky.

Actually, is there a way maybe to get the results of wasmtime.bindgen in-memory? Because if yes, there are importlib hooks that would allow me to load Python code from the in-memory blob without ever hitting the disk, and that's something I think is fairly practical to deploy and I might even be able to contribute it to wasmtime-py itself.

whitequark commented 3 months ago

It looks like there is, so I think I'll try to use that with an importlib hook so that there's nothing actually written to disk at any point.

alexcrichton commented 3 months ago

I like that yeah, providing the results of bindgen in a first-class fashion that's not tied to files and the disk I think is totally reasonable 👍

whitequark commented 3 months ago

Great! I think my implementation in #224 goes a really long way addressing it. Most importantly it handles the most finicky, obscure, and annoying bits that deal with Python importlib; the rest I think most contributors will find much easier to work with.