WebAssembly / tool-conventions

Conventions supporting interoperatibility between tools working with WebAssembly.
Artistic License 2.0
298 stars 65 forks source link

Emscripten and the Producers Section #93

Open kripken opened 5 years ago

kripken commented 5 years ago

After a lot of consideration, I don't think we want to change Emscripten to emit the producers section by default in release builds. Posting this issue to note that and explain why.

These are reasons for specifically not emitting the producers section in emcc by default. We could add an option for users that do want to do so, and of course other tools may have different factors to consider (in particular, Emscripten is used by ordinary developers, while tools like LLVM, wabt, or binaryen are used by toolchain developers, so the considerations might be different).

lukewagner commented 5 years ago

I can see the logic behind these reasons and I wish they had been brought up in earlier discussions, where I was specifically hoping to hear from tooling people whether the producer section would actually show up in release builds.

With Emscripten setting the precedent of stripping the section for these reasons, I can imagine the convention not being widely adopted which makes me question whether it's worth it to add browser telemetry. At the very least, it seems like this warrants an update to ProducersSection.md to re-set expectations, or maybe just removing ProducersSection.md entirely if there are no other consumers.

jgravelle-google commented 5 years ago

My thoughts on this have essentially flipped from my original: I think the Producers Section is a good thing to have in general, but probably doesn't make sense for Emscripen to emit.

The general principle I'm following here is one of incentives: what pressures exist on the various parties? Toolchain authors are competing on quality of implementation. Imagine an emcc-but-smaller fork of Emscripten, which provides the same experience to a developer, but ships binaries that are 100 bytes smaller across the board. Wasm-targeting developers don't inherently care about the ecosystem, and want to deliver the best experience to their users as possible, which means stripping code size. Assume they build with toolchain X, which may or may not include a producers section, at some point they may find it profitable to run a wasm-minifier over the resulting wasm module, which could easily be assumed to strip the producers section rather than add to it.

From the other side of the table though, what benefits can people get from the data provided by the producers section? Wasm developers (i.e. us) get good metrics on what tools are used where, and can make better decisions as to how to steer the platform, which is good. We aren't in full control of what users ship, however. (We do have a great deal of influence in Emscripten, mind) The way I see it though, the developers/users who would be most incentivized to annotate this data is from smaller toolchains who want to see how their work spreads.

So it makes sense for toolchains like the Rust compiler to annotate their wasms with producers sections, because being able to definitively say "Rust is used in 10% of the wasm modules on the web today" would be a huge victory. Whereas for establisted toolchains (Emscripten being practically unique in this regard), tracking proliferation of use is less important. I believe the background assumption we're all implicitly making is that Emscripten is used for 99+% of the wasm that's built today, which makes quantifying that less appealing than saving the ~100 bytes.

All that is to say, I don't think precedent matters as much as incentives here, because people will ultimately do what benefits themselves the most anyway. For Emscripten I believe this means to strip it, but for other toolchains I think the visibility incentive is sufficient for this to see use.

(Alternatively, my wild counter-proposal would be to not have any producers section at all, and run analysis based on "this wasm looks like it was built with tool ___" heuristics after-the-fact, which is imprecise and compute-heavy but requires no opt-in. We could always just do both, or use heuristic analysis only on the non-annotated modules, depending on how things actually shake out down the line)

kripken commented 5 years ago

(Alternatively, my wild counter-proposal would be to not have any producers section at all, and run analysis based on "this wasm looks like it was built with tool ___" heuristics after-the-fact, which is imprecise and compute-heavy but requires no opt-in.

I think this is really the way to go, actually. The main benefit is that it would not be a statistically biased sample: with the producers section, size-conscious websites (most of them?) will strip it out, and the remainder may well be different in what tools they use. The analysis approach would actually be sampling the real population of production wasms.

Analysis is not easy, obviously. But it would just need to run on a tiny random sample of the wasms on the Web (so it's not compute-heavy, necessarily; also validating it on that set would be enough). If we all collaborated on this it might not be that hard. I'd volunteer to help if that's relevant.

aardappel commented 5 years ago

How about.. making the producers section really small (1 LEB per producer) so there is no incentive to strip it for size? :)

binji commented 5 years ago

It's not too hard to eyeball and spot an asm2wasm, wasm backend, rust, go etc. But this won't provide nearly as much info as the producer section will.

binji commented 5 years ago

How about.. making the producers section really small (1 LEB per producer) so there is no incentive to strip it for size? :)

This still has the section overhead...

jgravelle-google commented 5 years ago

Reducing the incentive doesn't remove it. If I could choose between $100 disappearing out of my bank account vs. $1, I'd still be mad that the bank is losing my money for no reason. I don't mind when my balance goes down when I buy something, however, so the question is how can we get people to feel the cost is worthwhile?

Actually wait, second wild proposal: don't put the bytes inside foo.wasm, publish the section separately as foo.producers. End-users don't need to download the bytes every time, we don't lose any information, there's no incentive to strip it. Only problem is website developers won't feel any pressure to upload that file to their actual sites as part of their build process.

lukewagner commented 5 years ago

So for ProducersSection.md, I get the impression maybe we shouldn't remove it entirely, but file a PR to:

  1. Rewrite the intro section to just say "If you want to annotate what tools were used to produce this .wasm, here's an interoperable format. Note that tools in the pipeline may strip this section."
  2. Remove the "Known list" since there's not such a point of maintaining a centralized repository anymore.

Yes?

xtuc commented 5 years ago

Should we discuss about that during the WebAssembly GC meeting tomorrow?

lukewagner commented 5 years ago

Sorry, I missed this comment in time to add it to the agenda, but yes, that would've made sense.

xtuc commented 5 years ago

My impression is that the meeting didn't clarify what we should do here.

I think it's unfortunate, but I agree with @lukewagner's https://github.com/WebAssembly/tool-conventions/issues/93#issuecomment-459879563.