Support Incremental Build of Container Images

baronfel commented 1 year ago

Right now there are roughly two steps to container creation:

determine metadata properties for the image/basic validation (ParseContainerProperties target)
image creation (PublishContainer target)

As a result, repeat publishes do repeat work

image manifests are re-downloaded
image manifests are re-negotiated to find the best single image
image configuration is re-downloaded
build assets are re-tar'd
layers are re-uploaded (though this can be quite quick because the registries do sanity-checking)
manifests/configs are re-uploaded

Each of these steps (and possible more, granular steps!) should be factored out into separate tasks/targets, and each step should establish clear inputs/outputs so that MSBuild incrementality can have its greatest effect. This will give two main benefits:

faster publishes for the single-image case
easier implementation of multi-manifest publishing due to more reusable components

In addition, more of the targets would be able to run natively in Visual Studio - only the layer-creation step would need to be re-implemented. Greater code-sharing in this way should lead to a more unified experience between VS and the CLI.

rainersigwald commented 1 year ago

Note that caching image manifests can be dangerous: you can imagine an incremental build that wakes up once a month to do an incremental containerize. Nothing on the local box has changed, but you'd want to fetch the latest image manifest definition and build a new image on top of it (with the same layer tarball that was used last time).

I might actually push for layer determinism before incrementality, since as you say the registry should handle the expensive part of layer deduplication.

baronfel commented 1 year ago

Fair point there, there's some overlap with https://github.com/dotnet/sdk-container-builds/issues/114 as well in this discussion. Most classic container tooling caches manifests by default, but that's at odds with the 'secure/latest by default' intended use case. The intent I have with this issue is more to share work during a container publish, especially a multi-project/multi-RID publish. So I'd want to reduce the amount of manifest-fetching done across that entire set of operations.

rainersigwald commented 1 year ago

The intent I have with this issue is more to share work during a container publish, especially a multi-project/multi-RID publish. So I'd want to reduce the amount of manifest-fetching done across that entire set of operations.

An excellent design consideration! We might consider doing a RegisterTaskObject with lifetime Build to cache some of the fetches for the lifetime of a build/publish operation.

jetersen commented 10 months ago

I wonder how this would work together with GHA caching which is something Docker is currently supporting as an experimental feature: https://docs.docker.com/build/cache/backends/gha/

baronfel commented 10 months ago

We probably wouldn't interop with that feature, at least not initially. We already don't reuse the Docker cache (and neither do our contemporaries like Jib/ko).

baronfel commented 6 months ago

Additional details: https://github.com/dotnet/sdk/pull/39196#discussion_r1512964490

If we were doing better MSBuild Incrementality, you could imagine a situation where we'd do

compute desired container base image (existing target)

fetch + resolve manifest (list) to single base image manifest from the base image (new step)

compute container config data (existing ComputeContainerConfig target)

download base layers (in parallel) (new step)

create the image/push to appropriate storage (existing target, much less work done inside it)

all as separate tasks that could take advantage of MSBuild incrementality and parallelism. In this world, none of the label-generation flags would ever need to be passed to the task. So I'm viewing this as an intermediate-stage.

dotnet / sdk-container-builds

Support Incremental Build of Container Images #438