dotnet / source-build

A repository to track efforts to produce a source tarball of the .NET Core SDK and all its components
MIT License
266 stars 132 forks source link

For CI tarball builds, generate the tarball without a full build #831

Closed dagood closed 3 years ago

dagood commented 5 years ago

In our CI, we run a production build (this builds all repos), create the tarball based on the results, then build that tarball. This is the main reason our CI takes so long: we have to build every repo twice, in sequence.

We think we can create a tarball using only checked-in information. The source in the tarball is "clean", and we can download the prebuilts we need using the prebuilt baseline. With a little processing (decompiling ref-only packages, downloading/inflating the currently prebuilt BuildTools toolset), we can in theory produce a tarball fairly quickly. This "buildless" tarball has the potential to cut CI time in half.

Potential problem Q&A:

  1. What happens when the production build stops bringing down the same prebuilts as the baseline?
    • We can detect this during CI builds and fail.
      • We already do this when prebuilts increase: we could make the build fail when prebuilts decrease as well.
      • The dev can then either copy the new baseline from CI, or run their own build locally to regenerate the baselines. Any decrease in prebuilts will likely be intentional, and they'll have regenerated the baseline already.
  2. What if the production build does something catastrophically wrong to the tarball?
    • We can do a comparison in CI between the tarball the production build generates and the "buildless" tarball during the production leg.
  3. How does the buildless tarball get source-built prebuilts?
    • WIP: Stop using these. (?)
      • We want to stop including these in the tarball anyway, since they're prebuilt binaries.
      • It seems possible to work around this in various other ways we didn't consider yet, such as manual tweaking of prebuilt packages (e.g. runtime.json in the Platforms package).

Basically, our current legs:

become:

This is particularly interesting once we start testing toolset bootstrapping. That calls for this general flow: "production => tarball build => bootstrapped tarball build". Having three builds in sequence to validate CI seems untenable. If we can just have "tarball build => bootstrapped tarball build" based on a buildless tarball, at least our CI time would stay around the same as now.

/cc @dseefeld @crummel @dleeapho

omajid commented 5 years ago

Even outside a CI context, I like this idea. The "source" tarball for most projects is just a git archive or some such. No compiling is involved. This makes it painless to do source archives and also makes them reproducible. I would like to see .NET Core in such as place as well, where we could create a source tarball without building anything. It will cut down my packaging time for Linux distributions too.

omajid commented 4 years ago

I did some quick measurements. Taking the current 3.1.101 tarball, and removing all prebuilts from it, the tarball goes from 2.4 GB to 200 MB in size. xz compression (.tar.xz) brings it to the order of 100MB.

dagood commented 4 years ago

So we now have no "full" prebuilts, but @dseefeld noted a few obstacles I'm putting here to track:

  1. the reference-only package ilasm/ildasm process
  2. Source generation in coreclr.

I don't think there's a reason we can't track reference-only decompiled prebuilts the same was I proposed we could track prebuilts above. Related: https://github.com/dotnet/source-build/issues/866

For source generation I'm not familiar with the problem.

MichaelSimons commented 3 years ago

This has now been implemented as part of ArPow.