WebAssembly / binaryen

Optimizer and compiler/toolchain library for WebAssembly
Apache License 2.0
7.27k stars 715 forks source link

Policy on requirements to compile and/or run Binaryen #6060

Open dschuff opened 8 months ago

dschuff commented 8 months ago

Binaryen has started to use some C++20 features, and it would be nice to use even more. For that we'll want to make sure that Binaryen can still be compiled on as many systems as possible, and that our release binaries (such as they are) run on as many systems as possible. Since compilers/SDKs do not always have completely uniform support of e.g. "all of C++20", and the set of systems someone might want to compile and/or run on can go very far back (e.g. Ubuntu Trusty was released in 2014 and is not fully "EOL" until 2024), there will always be a tradeoff between some number of real or hypothetical users who would have to jump through extra hoops to build/run vs. a set of features enabled.

Here's the relevant info:

MacOS: According to https://developer.apple.com/xcode/cpp/#c++20 several C++20 library features require XCode 15 and a few require 14.3, and a very few also require higher deployment targets (notably the filesystem library requires OSX 10.15, and the synchronization library functions require OSX 11, and some variant methods require 10.13). Our CI bots currently run XCode 14.2, and the emscripten-releases builder currently builds and packages its own libc++ to avoid dependency issues.

Linux: GCC language feature support is listed in https://gcc.gnu.org/projects/cxx-status.html and library support in https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2020. Just from eyeballing the docs, it looks like gcc 10 or maybe 11 would get just about everything on the language and library side. The GCC 10 series dates from May 2020 and the 11 series from April 2021. Clang support is shown in https://clang.llvm.org/cxx_status.html and library support in https://libcxx.llvm.org/Status/Cxx20.html. There seems to be a larger range of versions that support various features (ranging from 8 to 17). Binaryen CI runs Ubuntu 22.04, but emsdk runs Ubuntu 20.04

Ubuntu: Focal (20.04, with standard support until mid-2025), comes with GCC 10, although there is also a PPA with backports of newer toolchains. It also has clang-12 and libc++12 (and looks easy to use these instead of the system libstdc++). Debian: Similar to Focal, with GCC10 and clang/libc++ 13 Fedora: Clang 16 and gcc 13 (its release lifetime is short).

Windows: According to https://learn.microsoft.com/en-us/cpp/overview/visual-cpp-language-conformance?view=msvc-170 most of the C++20 features seem to have appeared in the various patch updates for VS2019 (16.x). Binaryen CI uses VS2022.

Relatedly, there's the LLVM policy at https://llvm.org/docs/DeveloperPolicy.html#updating-toolchain-requirements which basically says they generally aim to support LLVM and GCC versions from the last 3 years; currently that's GCC 7.4, Clang 5(!?), Apple Clang 10, MSVC 2019 16.7. This means they only support C++17 for now. But also LLVM has a much broader base of users who want to compile it from source than Binaryen does.

If we pick "2 year old compilers" as our benchmark, that gives us GCC 11, Clang 13, and MSVC 2022 (released November 2021). If we add that we want it to be easier to build on Ubuntu Focal and Debian Bullseye, we could drop the requirement to be GCC10 or clang 12 (at the cost of a few features). Going further back than that would probably drop several things. I think either of those could be reasonable, and we could probably go back to one of the later versions of MSVC2019 if we needed to.

Thoughts? I would also be interested in hearing from folks who build Binaryen or use our binaries, although they might not see this, so the most likely way they'd discover this is if we update the requirement and break them.

tlively commented 8 months ago

If we adopt a written policy like "supports two year old compilers," do we have a better way to test that than checking what compilers are two years old whenever we want to do something new and updating our CI to use those compilers?

If not, that seems ok.

dschuff commented 8 months ago

I don't know of one. And you might note that I did more above than just check the compilers (at least when it comes to Linux systems where toolchains tend to be managed by system packages). I do think that doing the evaluation on-demand when we want something new makes sense, there's not necessarily any need to update just for the sake of updating in a project of this size. And again we can afford to be more flexible, perhaps in response to users who might come out of the woodwork (or to make more explicit tradeoffs about what systems we might leave behind vs what we'd get in return) than a big project like LLVM.

tlively commented 8 months ago

The links in the original post here are valuable; when we need to evaluate ecosystem support for new features again in the future, it will be good to know where we need to look.

As an end state here, it would be good to write a policy down in the README.md along with these links to make evaluating the policy easy in the future.

What about using 2 year old MacOS, Ubuntu, and VSCode defaults as our benchmark? That's slightly more robust than looking at just the age of the compiler and also simpler to evaluate than additionally looking at Debian, Fedora, and individual compilers.

kripken commented 8 months ago

cc @juj @sbc100 for thoughts

sbc100 commented 8 months ago

It seems reasonable to pick a benchmarks like "2 year old MacOS, Ubuntu, and VSCode" initially, but the most important thing is to be responsive and flexible and folks show up who have different requirements. It could be that some large set of users need to support an 8 year old version of linux and we might need to modify our benchmark accordingly.

We probably won't know until what the real requirements are until we start breaking folks.

dschuff commented 8 months ago

For windows and mac, XCode and MSVC are pretty decoupled from the OS: the OS doesn't come with a compiler, so the version of XCode or MSVC you install determines the compiler you have. So that's really what I'm referring to when I talk about compiler versions on those platforms, and I think it's safe to ignore the OS version in that case. It's more complicated on Linux, since the compiler is managed by the OS package manager, so it makes more sense to think about what compilers are available or are default for a particular distro of a particular age.

sbc100 commented 8 months ago

But don't macOS on Window also supply the runtime libraries that the binaries depend on? libcrtd etc on windows and libc++.dylib on macOS?

dschuff commented 8 months ago

Ah yeah I would actually recommend that if users are concerned about that, that they statically link libc++ as we do.

sbc100 commented 8 months ago

Ah yeah I would actually recommend that if users are concerned about that, that they statically link libc++ as we do.

That isn't always easy though is it? .. it might be a lot to ask. For a start where would they such as static libc++.. libc++ normally comes with the OS.. and finding a standalone modern version of it often involves building one from source (which is not very fun)

kripken commented 8 months ago

Speaking of C++20 (not the main focus here, but relevant), it looks like Carbon is targeting C++17 for migration,

https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/interoperability/philosophy_and_goals.md#interoperability-with-and-migration-from-existing-c-code

That might be a reason not to jump to C++20 in a way we can't undo, as keeping the option to migrate to Carbon seems useful, even if it is for the far future. Maybe this just means minimizing the number of files we heavily depend on C++20 in.

sbc100 commented 8 months ago

Speaking of C++20 support, this issue came up over the weekend: https://github.com/WebAssembly/binaryen/issues/6084

tlively commented 8 months ago

Looks like Carbon may not support CRTP as easily as other C++ patterns: https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/interoperability/philosophy_and_goals.md#crtp-support

Not good news for Binaryen!

dschuff commented 7 months ago

Ah yeah I would actually recommend that if users are concerned about that, that they statically link libc++ as we do.

That isn't always easy though is it? .. it might be a lot to ask. For a start where would they such as static libc++.. libc++ normally comes with the OS.. and finding a standalone modern version of it often involves building one from source (which is not very fun)

Statically linking the stdlib is not the same as building it yourself, it's much easier. If the SDK (i.e. MSVC or XCode or the Linux version of the build machine) is new enough to compile the features we use, then it will include a version of the C++ stdlib that can be linked statically. That's just done using a compiler flag (e.b. -static-libstdc++ or /MT)

dschuff commented 7 months ago

WDYT @juj, do you need to support toolchains older than GCC 11, Clang 13, or MSVC 2022 ?

juj commented 7 months ago

Thanks for pinging.

Currently we should be good with adopting this. Our CI builders should be fairly configurable to build Binaryen on whatever is the latest.

However we do need the ability to run Binaryen on potato old Windows 8 PCs, so it would be good to check that the static libstdc++ option is readily tested on binaryen/emsdk/emscripten CI.

I recall that in some build setup about a year ago or so, we had to manually sideband download and build libstdc++, because binaryen nor emsdk bundled a suitable one to use in an automated fashion. (vaguerly remember that would have been a macOS thing, not a Windows thing)

dschuff commented 7 months ago

I believe that should be the case with the current emscripten-releases builds. On windows they are built with the /MT flag which should statically link the MSVCRT runtime, and should not have a runtime dependence (we don't use a custom/manual C++ stdlib on Windows). On Mac, we currently set our deployment target to 10.14, and we do bundle our own custom libc++ (but my understanding based on my research above is that that wouldn't be necessary if we were using XCode 15.)

Although, I actually just realized that there's a hole in my analysis in the OP, which is that it considers "2 year old compilers" but not XCode versions (which IIUC is what includes libc++ for most developers). The 2-year-old XCode version today is 13.2. According to the table we'd be missing some features that require 14.2 or 15. AFAICS none of them are features we've considered critical; so it would be slightly annoying but maybe still acceptable to only support a subset of features.

All but one of the features with that requirement are library features, so requiring a libc++ newer than XCode could be a workaround (i.e. allowing Binaryen to build with older XCode), but that could be an annoyance for users wanting to build their own (maybe a bigger annoyance than just using XCode 15).

Regarding testing on CI with the oldest supported version, I agree that we should try to do that if possible.

sbc100 commented 7 months ago

What about those who use the build rules in emsdk.py to build binaryen? I'm not sure those are setup to download/build libc++ from source.

dschuff commented 7 months ago

Yeah AFAIK they are not; so fixing that would be part of the additional annoyance required to use all of C++20 and also support building Binaryen with XCode 13.2. I wasn't clear in my last post, but I think it might be better just to either avoid use of those particular features, or require a newer XCode.

dschuff commented 7 months ago

this thread is a related discussion from the LLVM side. I will say that this post in particular is making me a bit less eager to push on this aggressively though...