emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.71k stars 3.3k forks source link

Support llvm stable releases #11362

Open sbc100 opened 4 years ago

sbc100 commented 4 years ago

Historically emscripten has been built around always using the very latest upstream version of LLVM. Essentially it depends on the top-of-tree version.

However, for the purposes of packaging, reducing install and download size, and making emscripten more portable, it would be nice to start being compatible with the stable version of LLVM.

We have have had many requests for this feature of the years (e.g. #11313)

This would mean more testing since each emscripten release would need to be tested fully against both LLVM stable and LLVM tot. For this reason it probably makes sense to make this move once we fully remove fastcomp (#11319).

The downside is that we will need to add version checks in the emscripten codebase when it depends on new/tot features.

I suggest that we perhaps give this a try starting with LLVM 11. E.g. we support LLVM 11 and tot until LLVM 12 is releases and which point we support LLVM 12 + tot..

tlively commented 4 years ago

What will we do when a feature changes to be incompatible with the stable version of LLVM? Should we error out? Have a branch to handle both the old and new behaviors? I guess for unstable features that are changing frequently in upstream LLVM we can just error out eagerly if the LLVM version is not the one we test with.

kripken commented 4 years ago

Alternatively, we could refer people that want to use llvm 11 release to a specific emscripten version that is known to work with that? I.e. if they want to use an LLVM from the past, they can do so with the emscripten from the same time in the past.

sbc100 commented 4 years ago

If we take that alternative approach I still think it would be nice to test against the actual release versions, rather than just the nearest emscripten release to the release revision.

sbc100 commented 4 years ago

What will we do when a feature changes to be incompatible with the stable version of LLVM? Should we error out? Have a branch to handle both the old and new behaviors? I guess for unstable features that are changing frequently in upstream LLVM we can just error out eagerly if the LLVM version is not the one we test with.

I'm imagining there will be some places in emscripten where we need to do:

if LLVM_STABLE:
   error_out
else:
   use_new_feature

Nut hopefully they should be fairly few and far between. The frequency of these divergences should tell is fairly quickly if the approach is a sensible one.

Initially I think the testing burden will be the biggest obstacle we face.

sbc100 commented 4 years ago

Looks like its happening now: https://github.com/emscripten-core/emscripten/pull/11637

What do you all think about trying to support 11.0 and 12.0 going forward?

kripken commented 4 years ago

I think if have the CI resources to also test 11 stable, it makes sense to experiment with this. But I think it should be an experiment we are ok with stopping, since it may become too hard to be worthwhile, like if we start to need lots of ifs for what LLVM version is used, as mentioned earlier.

sbc100 commented 4 years ago

One advantage to testing against 11, once it is stable, is that we don't need to build our own versions of LLVM to test it.. we can just use the official binaries.

I suggest we try to find a sensible subset of tests to run against llvm 11.. perhaps monitoring the areas where we requires 12 and ensuring the subset includes those areas.

kripken commented 4 years ago

I'm worried about picking a subset, if we want to claim we support those binaries officially (do we?). Maybe we can check them less frequently or something, though?

sbc100 commented 4 years ago

I would imagine that as long its only run a subset of tests we would not claim full support.

Once we ditch fastcomp and can do more testing I think it might be worth increasing the level of testing and officially supporting stable llvm. But I'm not suggesting that now.. .so maybe we should still have some kind of warning message about the lack of full support?

sbc100 commented 4 years ago

Perhaps we can add this to the FYI bot along with the contrib ports tree?

kripken commented 4 years ago

Last two comments sgtm (not claim full support, use the FYI bot).

kripken commented 4 years ago

To expand a little on something mentioned earlier without detail: As an approximation to this, I think we can find the LLVM commit closest to a particular release, and then find the emscripten release in which that appeared. That emscripten release should be able to work with the official LLVM release. There are some minor worries like LLVM can change things after branching for release, and the emscripten release may contain more LLVM commits after the release, but it would probably work in most cases.

If we want to try this, then setting up way to test the official release binaries is basically all we need. We can then open a PR here, using the emsdk for the proper emscripten release normally (including binaryen from back then) but replacing LLVM with the release binaries. If that passes I think we can say it's good.

sbc100 commented 4 years ago

You mean do a one off test that verion X.Y of emscripten works with upstream official binaries from A.B of llvm?

I guess there is some value in that. Setting up testing to bring in the official binaries is main bit of work needed to do that.

I still think there is value is having support on ToT emscripten for at least one stable version of llvm rather than constantly being on ToT LLVM, but I admit that is a lot more work.

kripken commented 4 years ago

Yeah, a one-off test (here on github CI). Then we can officially "bless" that LLVM version and recommend it in our docs, "if you have LLVM release 9, use it with emscripten x.y.z (and here's how to use those binaries with it)".

(I'm not opposed to the other option, if we think it's worth it.)

tlively commented 4 years ago

This would be a more principled way to determine when Emscripten version is used on the Rust CI for testing the Rust emscripten targets, too.

tlively commented 4 years ago

https://github.com/rust-lang/rust/pull/73526#issuecomment-675795604 is a timely example of how this would be beneficial to Rust.

sbc100 commented 4 years ago

I'd like to give this a try starting with llvm 11 which is due for release soon.

Here is what I think we should try:

  1. Start accepting 11 as will as ToT for llvm versions (that means 11 + 12 and soon 11 + 13)
  2. Use ToT version on github CI and emscripten-release CI
  3. Run all tests against llvm 11 on linux-test-suites builder (We already run the asan test suite and others in this way.)
  4. Perhaps run a small subset of llvm 11 tests on github CI to prevent trivial regressions.

In the long run this could make emscripten install-able without building either llvm or binaryen :)

kripken commented 4 years ago

What's the plan for when a diverging patch lands on LLVM trunk? For example we may get it to emit invokes differently, or you may change the dynamic linking ABI. That will have corresponding mandatory changes on emscripten (and maybe binaryen too). Would we be forced to support two "backends" in effect in emscripten?

Unless I'm missing something I think it's safer to go the other way - to "certify" a specific fixed emscripten hash for LLVM release 11 by running the full test suite on it once and then documenting that. That approach has no ongoing cost of tot emscripten supporting more than tot LLVM.

sbc100 commented 4 years ago

There would be an ongoing cost yes, because we would need to support both stable and tot versions.

But divergences should be fairly rare and I imagine will be often be linked to features that we can simply not support with the stable version. e.g we could say that native exception handling only works with tot llvm.

I think I don't like about pinning emscripten is that if the approach becomes popular we end up getting bug reports for older version of emscripten and people who are stuck on old versions. Maybe that cost is lower?

Did you see that #12218 llvm 10 very nearly "just works" with emscripten tot.. indeed if you can build compiler-rt with llvm tot then llvm 10 does seems to work today.

I agree that your solution is simpler though.. and probably less maintenance for us.

kripken commented 4 years ago

But divergences should be fairly rare and I imagine will be often be linked to features that we can simply not support with the stable version. e.g we could say that native exception handling only works with tot llvm.

If they are indeed very rare maybe this is fine. And I agree we can support new features only on tot LLVM. But I worry about things like us wanting to remove the old exceptions approach at some point, or remove the old dynamic linking, etc. - we'd be forced to keep those around until the next stable release, with your approach. And I also worry about other changes like how reactors work or other stuff (random recent example: https://github.com/emscripten-core/emscripten/pull/12220). As we do less in binaryen and just use the LLVM output as-is, we become more sensitive to any such changes.

Another issue is code size changes - tot LLVM may improve some optimization and as a result code size tests will fail on stable. We may want to disable those tests when running stable. But that's risky, as we may regress size on the emscripten side in a way that just affects stable - that may happen in practice if we add some optimization that interacts with the new code patterns tot LLVM emits.

Overall this seems like a maintenance burden that may be large and unpredictable, to me.

I think I don't like about pinning emscripten is that if the approach becomes popular we end up getting bug reports for older version of emscripten and people who are stuck on old versions. Maybe that cost is lower?

Good question. I don't know how popular it could end up being.

I suspect this approach's overall cost will be lower. We do already get bug reports with old versions, and those are at least quickly diagnosed as such today.

sbc100 commented 4 years ago

Apparently llvm 11 was just released so if we want to try your approach perhaps we should tag an emscripten release now to match the llvm release?

kripken commented 4 years ago

We can tag now, but it won't match, I think? Emscripten is already on newer LLVM code after the branch.

We'd need to go back in history to find where the branch happened, so we find the emscripten tot build that is closest to the release. (And then we should run our tests on it.)

sbc100 commented 4 years ago

Sorry, my bad. LLVM 11 obviously branched a while ago.. ignore me.

tlively commented 4 years ago

The first version after the branch was 1.39.20. I had to figure that out to make the Rust CI use that version after they switched to using LLVM 11.

kripken commented 4 years ago

Thanks @tlively !

Then the next step here could be to create a PR here that downloads 1.39.20, then replaces that LLVM with a downloaded LLVM 11 release. If that passes, we should bless it in a nice document somewhere.

If no one else has time for this, I can try to find time (wouldn't be in the next few days though).

yowl commented 3 years ago

If you want the "stable" version, but the latest LLVM for that version, say 11 which looks like its out now, then the process is:

  1. download emscripten at the "stable" version (emsdk install/activate)
  2. download LLVM v11 at the latest, e.g. 11.0.1 (when that comes out)
  3. point emscripten at that LLVM (.emscripten and LLVM_ROOT)

Is that right?

sbc100 commented 3 years ago

That sounds like it would work. You might not even want to use emsdk.. because by using emsdk you are downloading another copy of llvm.

If you are using emsdk you are kind of opting in to using the tools as bundled.

jonassmedegaard commented 3 years ago

I do distribution packaging for Debian. We cannot custom-pick snapshots of tools best fitting each other, but must pick single (or few) instances of tools that work together.

It would be helpful for distributions like Debian if emscripten project documented latest version of officially released emscripten known to fully work with which officially released versions of other tools. Even if that (likely outdated) version of emscripten is no longer maintained (obviously it then makes sense to explicitly state that, to avoid both users and develoepers wasting time on unusable bugreports). Even better would be if emscripten project tracked which emscripten features is known to work with which officially released versions of other tools. Even if emscripten project does not support running in such "crippled" envrionments.

I,.e, please consider tracking what definitely fails and what seemingly works, even if you cannot invest the additional effort into making any promises about it. Later, as $stuff stabilizes in your own camp, you can consider making more promises. Your downstream users - distributions and direct users alike - may choose to make different tradeoffs and take more risks - and are then helped even by your identifying feature sets even when not accompanied with your support.

sbc100 commented 3 years ago

I'm not sure I totally agree with your statement: "We cannot custom-pick snapshots of tools best fitting each other, but must pick single (or few) instances of tools that work together." I myself was once a debian developer (in fact I'm not sure if I ever stopped being one) so I'm fairly familiar with how this stuff works.

Sure, it would be nice if you could really on the stable/bundle/packaged versions of other tools, but technically there is nothing stopping you from adding both binaryen and llvm sources to the emscripten source package and removing those dependencies. This is basically the model that the arch linux package uses and the model we have recommended so far. Its ugly, and sub-optimal, and it duplicates code and adds more work for your package builders, but its the only method we current support here upstream.

This bug is all about trying to move to the world you would prefer, were you can depend on system versions of binaryen and llvm. So we accept it as a goal, but we don't support it today.

The "blessed release" approach we seem to be converging on here means that users who insist on using stable version of llvm will be required to use slightly older version of emscripten. These are not "unsupported", they are just old (and stable), like a lot of the stuff in Debian, right? The implication is that if debian wants the emscripten package that runs with llvm-11 they will be required to package 1.39.20 (once we bless that version that is). The next packagable version will when be out llvm-12 blessed version which I think will be come soon.

Trying to build a more complicated compatibility matrix, including such things as partial compatibility is a lot more work and likely not feasible given the resources have the project.

jonassmedegaard commented 3 years ago

Quoting Sam Clegg (2020-11-18 17:54:59)

I'm not sure I totally agree with your statement: "We cannot custom-pick snapshots of tools best fitting each other, but must pick single (or few) instances of tools that work together." I myself was once a debian developer (in fact I'm not sure if I ever stopped being one) so I'm fairly familiar with how this stuff works.

Indeed - you joined about same time as me: https://nm.debian.org/person/samo/ - sorry if I bored you with all too familiar details.

Sure, it would be nice if you could really on the stable/bundle/packaged versions of other tools, but technically there is nothing stopping you from adding both binaryen and llvm sources to the emscripten source package and removing those dependencies.

True, nothing stops me from adding things into Debian that is counter-productive to Debian - and nothing stops others in Debian from blocking my added packages later on, because, well, it is counter-productive to Debian: https://wiki.debian.org/UpstreamGuide#No_inclusion_of_third_party_code

This is basically the model that the arch linux package uses and the model we have recommended so far. Its ugly, and sub-optimal, and it duplicates code and adds more work for your package builders, but its the only method we current support here upstream.

I am well aware that the only model supported by emscripten does not fit the model we practice in Debian.

This bug is all about trying to move to the world you would prefer, were you can depend on system versions of binaryen and llvm. So we accept it as a goal, but we don't support it today.

I am well aware that emscripten does not support my World view. I am sorry if I made a different impression.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

willcohen commented 1 year ago

@jonassmedegaard For whatever it's worth, as a 2023 data point, I can confirm that emscripten 3.1.36 builds using LLVM 15 with https://github.com/llvm/llvm-project/commit/1532be98f99384990544bd5289ba339bca61e15b.patch backported to LLVM 15 (see https://github.com/willcohen/nixpkgs/commit/14dacfeabf8ab0792674dfeee4515049fe6c84fe). As of now I still don't have it working with LLVM 16.0.0 but it's possible it works with a later point release.

willcohen commented 1 year ago

I can also confirm that latest emscripten 3.1.39 builds with no LLVM backports with LLVM 16.0.1 (ie nothing in 17 is needed yet).

willcohen commented 1 year ago

As best as I can tell, nixos/nixpkgs is also always going to need to generally pick a stable version of LLVM, since getting LLVM to build in the immutable environment requires a bit of packaging. Due to the immutable store of the various package dependencies, the LLVM maintainers on nix already have to do a lot of massaging to get LLVM to build correctly, and trying to maintain another parallel tip-of-tree LLVM build would unreasonably increase the package size of LLVM.

Any time nix tries to bump the emscripten version, that already basically triggers a check for all downstream dependent packages, so I think that more or less addresses nixpkgs's concerns re compatibility. That's not quite as robust as the full test suite. I'll probably look into trying to make sure the full test suite is checked there going forward during builds, so at least it's not encouraging regressions.

willcohen commented 2 months ago

As a small update here, nix now has a weekly build of the latest LLVM from tip-of-tree, which emscripten now depends upon. While this doesn't precisely line up with emscripten's use of tip-of-tree at time of release, should allow nixpkgs to stay up to date with emscripten releases, with dramatically reduced moments where LLVM and emscripten have the same mismatch as when trying to depend on LLVM stable. https://github.com/NixOS/nixpkgs/pull/326633