bytecodealliance / wasmtime

A fast and secure runtime for WebAssembly
https://wasmtime.dev/
Apache License 2.0
14.81k stars 1.24k forks source link

Revert "wasi-adapter: Implement provider crate that embeds the adapter binaries (#8792)" #8856

Closed alexcrichton closed 1 week ago

alexcrichton commented 1 week ago

This reverts https://github.com/bytecodealliance/wasmtime/pull/8792 after I've discovered a few extra consequences of having this in-repository:

Overall this is much more brittle than I originally thought so in the end I don't think it's going to be viable to check in binaries here. I think we should instead explore strategies such as generating the crate-to-publish on-the-fly as part of the publication step or similar to that.

cc @juntyr since this'll affect your work, would you be up for helping to explore an alternative solution that publishes the binaries we generate rather than checking in the generated binaries?

juntyr commented 1 week ago

This reverts #8792 after I've discovered a few extra consequences of having this in-repository:

* Checked-in binaries block updates to `wasm-tools` crates such as [upgrade to wasm-tools 0.211.1 #8838](https://github.com/bytecodealliance/wasmtime/pull/8838)

Does the adapter have an indirect dependency on some of the tools crates ... oh yes it depends on wasm-encoder at least. So it makes sense and is correct that bumping wasm-tools can change the compiled adapter output but that's not what we want to show up in everybody's CI failure.

* Checked in binaries aren't reproducible across platforms at the moment, meaning that macOS binaries are different than Linux binaries for example (symbols and function ordering)

I see that you're already whacking that mole https://github.com/rust-lang/cargo/pull/14107 :) but this will take a while to hit stable.

* Checked in binaries change when the wasmtime version bumps which I tested with `./scripts/publish.rs bump` (unsure why, but I'm assuming some hashed thing gets in there somehow)

This goes back to the adapter using the repo crate version, which is included in the metadata hash.

Overall this is much more brittle than I originally thought so in the end I don't think it's going to be viable to check in binaries here. I think we should instead explore strategies such as generating the crate-to-publish on-the-fly as part of the publication step or similar to that.

Do we need to fully remove the WASM binaries from the repo? If so, what do we do with the adapter provider crate? To avoid magically creating it during publication, it could point at zero-sized dummy files in the repo, and the publish-to-crates.io script could recompile the adapter and replace those files just in time. However, this means that this crate would be very magical and not be used as a git dependency.

If we can keep the binaries, my idea would be to not check for matching builds in CI, to manually rebuild the adapter in the artifact action as before, and to rebuild as part of the publish script that creates the branch for the next release (what should happen if changes that would update the binary are made to this branch after splitting off?). This would also allow people to use the adapter provider crate in a git dependency.

In any case, this means we can add the git commit hash to the version info again :D

cc @juntyr since this'll affect your work, would you be up for helping to explore an alternative solution that publishes the binaries we generate rather than checking in the generated binaries?

Yes of course, I'm sorry for not fully anticipating the breakage caused by the adapter provider

rvolosatovs commented 1 week ago

At wasmCloud we maintain wasmcloud-component-adapters crate (https://docs.rs/wasmcloud-component-adapters/latest/wasmcloud_component_adapters/), which we generally update after each Wasmtime release and have a simple build.rs, which fetches the adapter binaries from Wasmtime release page. Would it be viable for Wasmtime to adopt a similar approach, where essentially the adapter URLs and hashes would simply be updated after each Wasmtime release?

bjorn3 commented 1 week ago

Would it be viable for Wasmtime to adopt a similar approach, where essentially the adapter URLs and hashes would simply be updated after each Wasmtime release?

That is incompatible with offline/network-isolated builds.

rvolosatovs commented 1 week ago

Would it be viable for Wasmtime to adopt a similar approach, where essentially the adapter URLs and hashes would simply be updated after each Wasmtime release?

That is incompatible with offline/network-isolated builds.

We do actually build in an isolated (Nix) sandbox. The crate I mentioned manages the adapters using Nix and build.rs parses the resulting lockfile for hashes and URL. For building within Nix sandbox the paths to adapters are passed as env vars.

This way in regular day-to-day dev environments the adapters are fetched once and cached in cargo cache. For isolated/offline builds they can be passed as env vars. Hashes of the binaries can then be verified at build time, regardless of whether they were fetched at build time or passed via env var

I'm quite sure the crate release process could involve bundling the actual binary and uploading that to crates.io as part of the crate, which would enable out-of-the-box offline builds at the cost of crate size. We did not have a need for it and did not do it

juntyr commented 1 week ago

I initially also just fetched the adapters from the release page using reqwest. However, having to compile reqwest as a build dependency, especially its HTTPS support, has made this approach non-workable in some of the build environments I need to support, to the point where I used Command::new("wget") in my build script.

The adapters are not large binary files and since they are cross-platform, I think baking them into a crate as const byte strings can be argued for (ideally, at some future point, the crate could just depend on the adapter directly with an artefact dependency).

In any case, I hope that we can find an approach that keeps the provider crate as Rusty as possible and without heavy build dependencies.

bjorn3 commented 1 week ago

ideally, at some future point, the crate could just depend on the adapter directly with an artefact dependency

As I understand it the adapter needs to be built with very specific compiler options as it may not contain a data segment. There is no way to guarantee this across rustc versions. As such the only way to guarantee a correct build of the adapter is by using the exact same rustc version and compiler flags as is done by wasmtime's CI. Artifact dependencies don't allow controlling the rustc version (it always matches the host compiler) and I'm not sure artifact dependencies allow controlling all flags we need (disabling incr comp, disabling all assertions and using opt-level=s)

juntyr commented 1 week ago

ideally, at some future point, the crate could just depend on the adapter directly with an artefact dependency

As I understand it the adapter needs to be built with very specific compiler options as it may not contain a data segment. There is no way to guarantee this across rustc versions. As such the only way to guarantee a correct build of the adapter is by using the exact same rustc version and compiler flags as is done by wasmtime's CI. Artifact dependencies don't allow controlling the rustc version (it always matches the host compiler) and I'm not sure artifact dependencies allow controlling all flags we need (disabling incr comp, disabling all assertions and using opt-level=s)

Those are unfortunately the current limitations of artifact dependencies. What I wanted to express is that in an ideal future, the customised adapter compilation could be entirely expressed within cargo. At the moment, I agree that publishing a prebuilt adapter binary that is compiled and checked with these specific flags is needed and better.

alexcrichton commented 1 week ago

Do we need to fully remove the WASM binaries from the repo? If so, what do we do with the adapter provider crate?

I haven't thought through this too too much but my thinking is we should remove binaries yeah. If we can't check on CI that they're what you'd get building locally then we don't want checked-in artifacts to the repo as it's too easy for things to get out of sync and cause headaches. I also agree that no matter what we do here it's going to render git dependencies inoperable and this would only really be suitable for crates.io-based dependencies.

My rough idea is that we'd have something like Cargo.toml.in somewhere in the repo which is checked on CI that it can produce a workable crate at any time and part of the publication step is to do that and then run cargo publish, probably outside of the publish.rs build script. The tricky part here is going to be orchestrating this in a way that's unlikely to fail since no one's going to touch this for a long time and it's not going to be trivial to reproduce locally. That being said I think it's possible because it's a crate that has zero dependencies and doesn't need to be part of the workspace, so for example having an extra step in CI that downloads the adapter artifacts and runs cargo package without cargo publish should be good enough (and then the publish-on-tag would run the actual cargo publish)

Yes of course, I'm sorry for not fully anticipating the breakage caused by the adapter provider

Not your fault at all! I tried to predict issues like this too and I wasn't able to see this coming, so this is just a natural development cycle as we figure out the true consequences of our actions heh.