dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.95k stars 4.65k forks source link

Collapse jobs in official builds #45396

Closed ViktorHofer closed 2 weeks ago

ViktorHofer commented 3 years ago

We currently build parts of the repository in two different modes: CoreClr vs. Mono.

CoreClr mode builds these components in parallel:

The mono mode builds sequentially.

After product build is done, we do the upload steps and publish with DARC. Every of these parallel steps require

Allocating machines take a considerable amount of time. Every new allocation opens the can for infra issues like disconnects, spin-up failures, etc.

Example build: https://dnceng.visualstudio.com/internal/_build/results?buildId=903123&view=results Runtime: Building takes 1h2min. CoreCLR: 15-28 mins Libraries: 12-15 mins Installer Build: 6-17 mins Mono (All subsets): 15-25 mins

We should collapse these different subsets together for at least official builds and by that reduce dependencies on AzDO and the network and make the build much simpler.

cc @dotnet/runtime-infrastructure

ghost commented 3 years ago

Tagging subscribers to this area: @ViktorHofer See info in area-owners.md if you want to be subscribed.

Issue Details
We currently build parts of the repository in two different modes: CoreClr vs. Mono. CoreClr mode builds these components in parallel: - CoreClr - Libraries - Installer The mono mode builds sequentially. After product build is done, we do the upload steps and publish with DARC. Every of these parallel steps require - ramping up and tearing down instances - checking out the repository - publishing assets to be consumed in legs that depend on them later Allocating machines take a considerable amount of time. Every new allocation opens the can for infra issues like disconnects, spin-up failures, etc. Example build: https://dnceng.visualstudio.com/internal/_build/results?buildId=903123&view=results Runtime: Building takes 1h2min. CoreCLR: 15-28 mins Libraries: 12-15 mins Installer Build: 6-17 mins Mono (All subsets): 15-25 mins We should collapse these different subsets together for at least official builds and by that reduce dependencies on AzDO and the network and make the build much simpler. cc @dotnet/runtime-infrastructure
Author: ViktorHofer
Assignees: -
Labels: `area-Infrastructure`
Milestone: 6.0.0
safern commented 3 years ago

Given that the installer legs need to hop on and off from docker containers to produce the rpms and deb packages on Linux, we need to wait for: https://github.com/dotnet/runtime/issues/36537 to be done in order to remove that requirement. Once that's completed we will be able to collapse the official build legs.

akoeplinger commented 3 years ago

@safern I might be misunderstanding the exact requirements but maybe we could use this recent Azure Pipelines feature to hop on and off from the docker containers: https://docs.microsoft.com/en-us/azure/devops/release-notes/2020/sprint-174-update#fine-grained-control-over-container-startstop

jkoritzinsky commented 3 years ago

@akoeplinger I did not know about that feature. We should definitely try to use it in this case.

safern commented 3 years ago

I did not know about that feature either. I'll look into it and use it to collapse the builds. Thanks for pointing it out.

safern commented 3 years ago

Removed blocked as I started exploring the AzDo feature to use multiple containers on the same leg and made good progress. Got the RPM and Deb packages building, now just need to figure out the other pieces of the official build like the cross dac when collapsing the legs: https://dnceng.visualstudio.com/internal/_build/results?buildId=945856&view=results

ViktorHofer commented 3 years ago

Thanks for the update Santi

safern commented 3 years ago

Update. It seems like currently we do some other stuff that might complicate this process:

1) We sign on windows some parts (not all of them, I believe coreclr signs the diagnostic assets). 2) We sign installer pieces on OSX 3) We need to build the cross DAC pieces in various RIDs. That means that after we build all the collapsed legs we would need to have a leg that depends on these "longer" legs in order to build the cross DAC package, so this might slow down the build as at the moment, the cross DAC package is built in parallel with the installer legs because it only waits for CoreCLR to build. Another thing we could potentially do, is just duplicate some steps in different legs and have the cross DAC legs build CoreCLR + these assets and ship those as part of the cross DAC (I don't know if that's possible cc: @hoyosjs?).

I tried to do this as part of the global-build-job template which seems pretty complicated as we would need to customize a lot of stuff per OS and subset. So I think having a new template for official build jobs would be best and cleanest as global-build-job is started to be used more and more on PRs and CI, so I want to keep that yml as simpler and cleaner as possible.

I'm going to remove my assignment at the moment since I don't have bandwidth for this at the moment and have other priorities, but if someone else tackles this I would be happy to drive it down and to design it together based on my previous investigation.

Here is a branch with my initial progress: https://github.com/safern/runtime/tree/TestMultipleContainers

hoyosjs commented 3 years ago
  1. We sign on windows some parts (not all of them, I believe coreclr signs the diagnostic assets).

Yes, otherwise up stack repos won't be able to easily open their dumps in VS.

  1. We need to build the cross DAC pieces in various RIDs. That means that after we build all the collapsed legs we would need to have a leg that depends on these "longer" legs in order to build the cross DAC package, so this might slow down the build as at the moment, the cross DAC package is built in parallel with the installer legs because it only waits for CoreCLR to build. Another thing we could potentially do, is just duplicate some steps in different legs and have the cross DAC legs build CoreCLR + these assets and ship those as part of the cross DAC (I don't know if that's possible cc: @hoyosjs).

It's finicky... The reason it's a merge leg is that it needs assets from both Windows and Linux. Even beyond that - say we figure something out with Docker or something alike - I am not sure is about the reproducibility of the native builds, say if two libmscordaccore built on different legs will have different build ID's since source paths are taken into account for the debug info.

I tried to do this as part of the global-build-job template which seems pretty complicated as we would need to customize a lot of stuff per OS and subset. So I think having a new template for official build jobs would be best and cleanest as global-build-job is started to be used more and more on PRs and CI, so I want to keep that yml as simpler and cleaner as possible.

We are always going to need level of specialization. For example, OSX will now have SIP requirements, and we might need to think about that. In general, while I love the idea of a clean 1,2,3 build, not having the possibility to introduce nuances feels like it's going to limit us.

jkoritzinsky commented 2 weeks ago

We did this work in .NET 9 and backported it to .NET 6 and 8.