Design multi-manifest (aka multi-architecture or multi-RID) publishing

baronfel commented 2 years ago

It's possible for containers to be specified in a 'manifest list' - a set of container image manifests that represent the same application on different underlying OS/hardware configurations.

Fundamentally this would be something like a multitargeted build. For some selection of OS/OSVersions and Architectures we'd need to orchestrate

building the app for that os/version/architecture
publishing a container image for that os/version/architecture

then, once all of those were done, we'd need to

create a new 'manifest list',
add each of the created images to the list,
ensure each one was 'annotated' with the correct os/platform/etc annotations, and
push the manifest list to the registry

There are a couple hurdles we'd need to cover:

if we would like to perform this by default for projects that specify multiple RIDs, then we may need a concept of a cross-RID/cross-TFM publish. This doesn't exist right now, and in fact is explicitly stopped by the cross-targeting targets in the .NET SDK.
if we modeled this as a different, standlone target, then we wouldn't need to tie it to 'publish' necessarily, but we might lose the benefit of associating with 'publish' as a concept
I'm unsure how much users would use this - is it worth pioneering a potentially-new publishing concept?
- This would bring us parity with Jib and Ko - maybe parity is enough of a motivator

Other requirements:

[ ] Need to ensure that tarball export works for multi-arch images as well.
[ ] This work might also be useful/necessary for users on the 8 LTS, so we should strongly consider keeping the NuGet package and making that package work seamlessly with 8 and 9 SDKs. We should do this work off of the .NET SDK 8.0.4xx branch and merge forward to 9.0 as a result.

Proposal

The gesture we want users to perform for multi-arch manifest publishing is

dotnet publish -t:PublishContainer

i.e. the same gesture they use today. To do this, we should change the implementation of the current PublishContainer Target from its current behavior of 'publish a single image for a single RID' to more of a decision-making target.

PublishContainer should

check if the project is currently in multi-TFM state. if so, error out. we require specifying a single specific TFM for now.
check if the project is in a 'multi RID' state - meaning the project does not have a RuntimeIdentifier specified and does have either ContainerRuntimeIdentifiers or RuntimeIdentifiers specified. If so, invoke a new "_BuildMultiImageManifest" target
otherwise, the project is in a single-RID, single-TFM state. In this state, invoke a new "_BuildSingleContainer" Target whose behavior is exactly the same as the single-image version of PublishContainer today.

An example of this per-scenario break-out is here.

Anticipated hurdles

Defaulted RIDs

The SDK does not have a concept of 'multi-RID' publish, and so today there are several places where it has assumed that the publish gesture implicates the desire for a single RID. The main way this negatively impacts us is the UseCurrentRuntimeIdentifier property, which is inferred as true here and ends up erroneously pinning us to a single RID. Setting it explicitly to false in the project files works around this.

PublishSingleFile

If PublishSingleFile is set and UseCurrentRuntimeIdentifier is not (as mentioned above), there is a mismatch in expectations. For now, for scenarios like our initial set, users may have to condition properties to only light up when the RID-specific build(s) are being done (for example, adding a Condition="'$(RuntimeIdentifier)' != ''" to several properties.

This is a symptom of the overall Publishing mechanisms of the .NET SDK not being designed for multi-RID publish today. In general, I think many SDK checks could be deferred to the 'inner RID' builds with no loss of intent, but we may have to push for this functionality in phases.

BuildMultiImageManifest

This target broadly should do two things

orchestrate the building of N RID-specific images, collecting output information about them
- N in this case is either the ContainerRuntimeIdentfiers Property (invented for this feature), or the RuntimeIdentifiers property (existing)
combine those outputs into a single Manifest List structure
orchestrate the export of the RID-specific images and the Manifest List to a registry, local daemon, or tarball in the correct order

Ideally, it would also unify any shared work that may happen during the multiple single-RID publishes into one unit of work that is shared. A specific example of this is

determination of Base Image to and fetching of the various manifests for that Base Image
tracking and parallelization of image layers that for the base Image(s)

Characteristics of the Manifest List

It should use the OCI Image Index schema, not the Docker Manifest List structure, if at all possible
It should contain N manifests, one for each RID created. These should map to the existing 'PlatformSpecificManifest` structure we know today
It should contain annotations matching all of the conventionally-applied labels that we support for individual images

Visual Aids

flowchart TD
    A[Start Build] --> B[dotnet publish -t:PublishContainer]
    B --> D[Publish for linux-x64]
    D --> E[Package a linux-x64 container]
    B --> F[Publish for linux-arm64]
    F --> G[Package a linux-arm64 container]
    G --> I[Package both containers into an image index]
    E --> I
    I --> H[Push containers and image index to registry]

Work Stages/Milestones

We should have two phases of the work - initial MVP and then productizing.

Initial MVP

In this stage we implement the multi-RID aware publishing feature with external registries as the primary destination - so no pushing to local daemons or exporting to tarballs. This is the most well-known area of development. Once this is implemented, we can hand a preview nupkg over to the internal partner teams that want to test the feature so they can begin validation.

Productizing

In this stage we would implement tarball export and local-Daemon export of manifest lists, as well as full testing and error handling scenarios.

baronfel commented 1 year ago

I discussed this at MVP Summit this year, and feedback from folks was strong - we should do this so that we have parity with other ecosystems.

baronfel commented 1 year ago

There are two separate requirements here:

create images for each architecture/platform that a user specifies that has a match with the base image chosen
bundle those platform- and architecture-specific images into a single manifest list and publish it

The former is relatively straightforward today. We essentially want to 'multi-target' like you would with a TFM, but with RIDs:

A Directory.Build.targets file with a Containerize target that enables multi-rid container generation

```xml <_RequiredContainerPublishTargets>Publish;PublishContainer <_TFMItems Include="$(TargetFrameworks)" /> <_SingleContainerPublish Include="$(MSBuildProjectFullPath)" AdditionalProperties="TargetFramework=%(_TFMItems.Identity); VersionSuffix=$([MSBuild]::GetTargetFrameworkVersion('%(_TFMItems.Identity)', 2))" /> <_SingleContainerPublish Include="$(MSBuildProjectFullPath)" /> <_RIDItems Include="$(RuntimeIdentifiers)" /> <_SingleContainerPublish Include="$(MSBuildProjectFullPath)" AdditionalProperties="ContainerRuntimeIdentifier=%(_RIDItems.Identity); RuntimeIdentifier=%(_RIDItems.Identity); VersionSuffix=%(_RIDItems.Identity);" /> ```

Adding this target to a project lets you run dotnet build /t:Containerize and generate architecture- and platform-specific images. We should look at including something like this in the official build targets for the 7.0.400 time frame if at all possible. Adding such a target also enables a related use case: containerizing every project in a solution that can be containerized. This enables workflows like dotnet build /t:Containerize myapp.sln && docker-compose up, where there is a compose.yaml that specifies the relationships between the services in the usual way, just using image: instead of build: stanzas for the project.

A worked example of this can be seen with this diff of the eshoponcontainers project. The docker compose YAML specifically is a useful example.

mu88 commented 11 months ago

I would love seeing this feature, as it's literally the last missing piece from throwing away my Dockerfiles and replacing them with the SDK Container Building Tools.

baronfel commented 5 months ago

Making multi-architecture images is pretty straightforward, as shown above. The next step is creating image manifests using those images. There's an example of this in my sdk-container-demo repository here that builds upon the snippet above by:

building architecture-specific images with a certain labeling scheme
pushing those images to the destination registry
creating a container manifest using the docker CLI with each of those images
pushing the generated manifest to the destination registry

Our tooling doesn't yet speak these manifests, but it could learn to.

mu88 commented 5 months ago

That looks very promising @baronfel !

This of course sparks hope 😉 what's missing from adding it to the Container Building Tools?

mu88 commented 5 months ago

So I gave @baronfel's prototype a try yesterday and it works nicely. The icing on the cake (despite being integrated into the SDK Container Building Tools) would be if pushing the arch-specific images to the container registry wouldn't be necessary - I prefer my build process not to rely on external things like a foreign container registry. Instead, it would be cool to build the multi-arch image completely within one's local environment.

Varorbc commented 3 months ago

@baronfel any updates?

baronfel commented 3 months ago

No, not as of yet. This probably won't make it for 8.0.400, but it is our highest-rated request so we do want to get to it!

Varorbc commented 3 months ago

Is there a plan for when to start the work?

Varorbc commented 3 months ago

@baronfel any updates?

baronfel commented 2 months ago

We are looking at taking this work on in the near term.

richlander commented 2 months ago

check if the project is in a 'multi RID' state - meaning the project does not have a RuntimeIdentifier specified and does have either ContainerRuntimeIdentifiers or RuntimeIdentifiers specified.

I'm not a fan of RuntimeIdentifiers since it (at least in theory) affects build. We should be focusing new functionality on publish time properties. This is similar to our old friend SelfContained, but I think we created PublishSelfContained for that.

I think many SDK checks could be deferred to the 'inner RID' builds with no loss of intent,

I like the idea of an inner-RID build, where one RID is set as a simplifying approach.

I know that MAUI had this same desire at one point, but perhaps it was satisfied via the TFMs that were created for them.

mu88 commented 2 months ago

To not c&p the same MSBuild draft logic (kudos again to @baronfel 🥳) into several of my .NET apps, I added it to my NuGet package mu88.Shared (see here for the sources).
I've added it to several of my .NET apps targeting both x64 and arm64 and it works nicely 🤓

baronfel commented 2 months ago

I'm not a fan of RuntimeIdentifiers since it (at least in theory) affects build. We should be focusing new functionality on publish time properties. This is similar to our old friend SelfContained, but I think we created PublishSelfContained for that.

Generally agree - that's why for this iteration the first-checked property would be ContainerRuntimeIdenfiers. We could easily drop consideration of RuntimeIdentifiers, but I'd like to encourage people to at least have that property set since that's the property that Restore actually keys off of, and as much as possible I'd like to avoid breaking some of the implicit assumptions that dotnet build && dotnet publish --no-restore --no-build -r <something> promises.

I like the idea of an inner-RID build, where one RID is set as a simplifying approach.

I know that MAUI had this same desire at one point, but perhaps it was satisfied via the TFMs that were created for them.

cc @jonathanpeppers for comment, but from my digging I think MAUI still broadly use RIDs in their publishing workflows. Examples here for calculating which then turns into a set of MSBuild Projects which are then reused in at least AOT publishing but possibly elsewhere as well.

In addition @jonathanpeppers has requested better SDK-level support for managing RIDs in https://github.com/dotnet/sdk/issues/37830.

richlander commented 2 months ago

I forgot about 'restore'. That said, I think it is still problematic. Let's talk this one through. I think we last discussed this one at length about 5 years ago.

I would also like to think this through to a broader set of scenarios. The key one I have in mind is native AOT, which requires an additional toolset and is easiest with build containers. The buildx behavior enables it quite well.

I don't think we need to build the perfect solution from the get-go, but we should ensure we know where we are heading.

jonathanpeppers commented 2 months ago

Android apps unfortunately have four RIDs (arm, arm64, x86, x64), and Mac apps have two (x64, arm64). iOS debug builds could have two if you build for simulator and device.

What we do currently is use $(RuntimeIdentifiers) with an s and do an "inner build" in a similar fashion as $(TargetFrameworks) with an s and gather the outputs into the app package. This runs the trimmer per architecture, and the AOT compiler per architecture. Right now, we have this logic in each platform's workload, as there wasn't anything built into the .NET SDK for this. I think this could be improved, but what we have has been working ok for customers.

baronfel commented 2 months ago

Thanks @jonathanpeppers - that matches with the plan here. I do agree that we need some better concept built into the SDK (and maybe even NuGet for per-TFM-per-RID targets?!). How did you all deal with some of the hurdles for the publish properties that assume a default RID when publishing at a RID-less level? From the issue description I mean things like:

Defaulted RIDs

The SDK does not have a concept of 'multi-RID' publish, and so today there are several places where it has assumed that the publish gesture implicates the desire for a single RID. The main way this negatively impacts us is the UseCurrentRuntimeIdentifier property, which is inferred as true here and ends up erroneously pinning us to a single RID. Setting it explicitly to false in the project files works around this.

PublishSingleFile

If PublishSingleFile is set and UseCurrentRuntimeIdentifier is not (as mentioned above), there is a mismatch in expectations. For now, for scenarios like our initial set, users may have to condition properties to only light up when the RID-specific build(s) are being done (for example, adding a Condition="'$(RuntimeIdentifier)' != ''" to several properties.

This is a symptom of the overall Publishing mechanisms of the .NET SDK not being designed for multi-RID publish today. In general, I think many SDK checks could be deferred to the 'inner RID' builds with no loss of intent, but we may have to push for this functionality in phases.

when doing dotnet publish ... with no specific RID requested.

jonathanpeppers commented 2 months ago

Some of the behavior mentioned above, we had to turn off. Android opts out of $(UseCurrentRuntimeIdentifier), for example:

https://github.com/dotnet/android/blob/95bf32a84e72aabcd3f074b6a47b0e7ca8930783/src/Xamarin.Android.Build.Tasks/Microsoft.Android.Sdk/targets/Microsoft.Android.Sdk.DefaultProperties.targets#L79

The approach we took for Android, was to default to set $(RuntimeIdentifiers) to all 4 by default when $(RuntimeIdentifier) is omitted. A customer might not ever set a RID or have to know about them. To make Debug builds reasonable, we detect the RID based on the attached device (or selected device in VS/C# Dev Kit). This way Debug-mode can just build one instead of four.

Since Mac is the only other platform with multiple RIDs (2) and they are not cross-compiling, $(UseCurrentRuntimeIdentifier) works for that case. They also made Release builds default to two architectures by default.

dotnet / sdk-container-builds