conda-forge / conda-forge.github.io

The conda-forge website.
https://conda-forge.org
BSD 3-Clause "New" or "Revised" License
128 stars 275 forks source link

Rebuild the 🌍: upcoming ABI break in MSFT Visual Studio #2295

Open h-vetinari opened 2 weeks ago

h-vetinari commented 2 weeks ago

The first complete rebuild in a loooong time

TL;DR: we'll need to rebuild all feedstocks for windows in conda-forge sometime in the medium term future (~1-2 years?). Given how comparatively few feedstocks are unix-only, this amounts to a full rebuild of conda-forge, which we've done only once[^1] before. In the meantime, conda-forge has grown by orders of magnitude, so this will be a big undertaking. Given the size/rarity of such an event, we might want to use this opportunity for other kind of clean-ups that aren't worth the effort in and of themselves, but might be able to ride the coattails here.

[^1]: for the GCC 4 -> 7 transition (at least AFAIU, it was before my time).

The reason

Prior to Visual Studio (VS) 2015, MSFT broke the ABI of the compiler itself with every major release. This meant that binary distribution channels like conda-forge were obligated to rebuild everything, or stay on the old compiler indefinitely (which is for example why python 2.7 got built with VS2008 until the end of its life).

However, starting with VS2015, MSFT changed tack and started keeping the ABI very rigorously, such that it became possible to combine artefacts built by VS2015, VS2017, VS2019 etc., with only minor restrictions for static libraries etc. Based on VS's very particular versioning scheme, this is called the vc 14 line, because the compilers themselves were versioned 14.0x (VS2015), 14.1x (VS2017), 14.2x (VS2019), 14.3x & 14.4x (VS2022). Some conda-forge recipes still bear silent witness to the beforetimes in that regard, by having a long-obsolete skip à la:

build:
  skip: true  # [win and vc<14]

Keeping the ABI meant that all sorts of bugs and enhancements were effectively "won't fix", both in the compiler itself, as well as in MSFT's implementation of the C++ standard library (called STL). Plus other pessimizations like making [[no_unique_address]] a no-op. Now however, the long-mythical ABI-breaking release is starting to appear on the horizon. In particular, the successor of VS2022 will be that "vNext" release, which the lead STL maintainer describes as:

v14 => vNext will be a total ABI break (like VS 2008 => 2010 => 2012 => 2013 => 2015), including renaming all of the versioned DLLs. No OBJ/LIB mixing will be possible, and DLL/EXE mixing will work only if the DLL interfaces are ABI-stable (e.g. COM, or completely extern "C" with no trace of STL types, etc.). We're going to change the representations of tons of types, and remove/change a ton of STL DLL exports.

Details so far are pretty sparse, but it really seems to be happening this time. Aside from STL's statement (which uses "will be" and isn't couched in caveats, in stark contrast to previous statements), it also started appearing on cppreference already:

image

What this means for conda-forge

Despite this not having happened in a very long time, we've kept some basic documentation on this https://github.com/conda-forge/conda-forge.github.io/blob/14ac9baa97f9b9b7e0fae9e4eb7b77e84bce5f2d/docs/maintainer/infrastructure.md?plain=1#L494-L514

Given the growth of conda-forge, this is going to be an absolutely enormous effort though, which paradoxically might make it a good opportunity to do some long-standing clean-ups that would be impossible to pull off otherwise. As a concrete example, I remember the idea to get rid of the distinction between %PREFIX% and %LIBRARY_PREFIX% == %PREFIX%\Library (which won't work without a full rebuild), but there are probably many more. And given how we'll be rebuilding >~95% of conda-forge, we might as well consider some clean-ups for linux/unix along the way.

One other issue I see is that we'll be running into a huge amount of unmaintained feedstocks, and we should maybe figure out a policy how to handle those without arguing on a case-by-case basis (e.g. archive after X weeks if there are no dependent feedstocks).

We still have a lot of time

Even if VS2025(?) came out tomorrow, we could realistically wait a few years before we switch to it by default. However, I foresee that many projects will want to start depending on VS2025 relatively quickly (primarily for C++23 support). For the vc 14 line it wasn't a problem to offer newer-than-default VS versions as an opt-in for projects that needed newer compiler-features (because ABI-compatible), but in this case we should probably aim for less than "years" after the release of vNext. Also, I wanted to raise this issue so that we can start thinking about a total rebuild (even though it's a scary thought), and what other clean-ups we might want to do at the same time, as those themselves might require some amount of lead time.

jaimergp commented 2 weeks ago

Thanks for the excellent write up! Just adding my 2c, since this might be the kind of task that would benefit from some funding to support the titanic (without icebergs, thanks) efforts.

isuruf commented 1 week ago

Removing LIBRARY_PREFIX is definitely a good idea here.

We might also want to pursue some funding for our CIs as this would double our CI on windows for a bit which is a considerable increase.

h-vetinari commented 1 week ago

I think for the CI we'd actually have a pretty good case to ask for an increase of our azure pool. After all, that's ultimately also owned by Microsoft, who are causing breaking the ABI break.

And conda-forge being one of the very few distributions on windows, they should have an interest in smoothening the path here (completely aside from the fact that the 200 agents haven't changed in years despite massive growth of conda-forge).

Where funding would be useful IMO is in terms of maintainer time, because of all the necessary infrastructure work, and because >23'000 feedstocks are going to run into a ton of bugs and corner cases.