dotnet / sdk

Core functionality needed to create .NET Core projects, that is shared between Visual Studio and CLI
https://dot.net/core
MIT License
2.59k stars 1.03k forks source link

Produce a smaller SDK #41128

Open richlander opened 1 month ago

richlander commented 1 month ago

The SDK could be a lot smaller. A smaller SDK would significantly benefit CI and similar environments. We typically think of the SDK being installed on a developer machine, being persistent there, and only be updated once/month (at most). I'm guessing that most SDK installs are not in persistent but disposable/temporary environments.

We're always looking for ways to reduce the size of containers. The fewer bytes we transmit over the wire, the better. The size difference between compressed and uncompressed SDK sizes is telling. The compression is very good.

Related issues:

I propose we do the following:

On the last point, I'm interested in producing two different flavors of the SDK in containers, a core layer and a tools layer. Perhaps there are other splits that would be more compelling / complementary.

For containers, I'd see a lot of value in having a -tools layer that contained all the existing dotnet- tools, like dotnet-watch and additional tools like dotnet-trace and also move PowerShell to that layer. It could be equally interesting to make dotnet-* tools available in a layer on top of aspnet but that's a different topic.

My hypothesis is that we can reduce > 50MB of compressed size from our SDK container images (for the core layer), with no loss of functionality for typical needs. That would be huge.

Ideally, we can do this for .NET 10.

am11 commented 1 month ago

Splitting SDK into smaller packages and restoring them as-needed would be neat.

Establish multiple supported layers of the SDK.

Would it be better to move them in nuget packages? Granted there will be more moving parts, but to solve that problem caching of nuget packages can help rescue, e.g. https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-net#caching-dependencies. dotnet publish --packages <cachable-reusable-path> type of options help in containers and other types of CI environments which, otherwise, repeatedly hit the network to download the exact same set of packages several times a day.

Improving the nuget package caching experience (with guidance and/or tooling support) would cover 'slow restores' issue more broadly; in docker (https://github.com/dotnet/dotnet-docker/issues/2457) and non-docker CI use-cases.

richlander commented 1 month ago

Most of the tools are already packages and should work well with that type of scheme. See: https://www.nuget.org/packages/dotnet-dump.

Using containers and using tools directly on the Actions host are two different approaches that are equally valid/valuable, but I don't think super related. If we make the SDK container image smaller (or bigger), it won't affect or benefit from the Actions caching. Similarly, if you are going the Actions route, you'll likely write more yaml. If you are going the container route, you'll likely enhance a multi-stage build Dockerfile.

Here's a sample that uses bind mounts: https://github.com/dotnet/dotnet-docker/blob/main/samples/releasesapi/Dockerfile.ubuntu-chiseled. I've been intending to use those more broadly. The syntax is a bit ugly, so my motivation has been a bit low. The docker init Dockerfiles use this syntax, however. I should develop a good performance test for it, to better describe the benefit.

DamianEdwards commented 1 month ago

Just to be clear, the included commands in the SDK like dotnet watch are not tools, they are SDK commands. dotnet-dump and dotnet-ef are tools that must be installed and aren't part of the SDK at all and thus don't contribute to its size. I assumed this issue was more about tackling deduplication of files for the included commands and other infrastructure in the SDK, e.g. multiple copies of Roslyn including all locale resources, etc.

richlander commented 1 month ago

That's a good point. In terms of layering, I'd like to see a build of the SDK that only has components needed for builds, meaning not dotnet watch.

These commands have been the source of all of our false-positives. They are the most likely to produce false positive in future unless we radically changed the way they are built (which may well be necessary).

https://github.com/dotnet/dotnet-docker/issues/5325

DamianEdwards commented 1 month ago

While I'm sympathetic to the motivations here, extra layering complexity in our SDK SKUs will of course increase complexity that end users are required to understand. I have an SDK, but what kind of SDK do I have? Can I "upgrade" or "downgrade" from one SDK type to another?

I think workloads were originally partially motivated by a desire to better factor the SDK and optionally allow the acquisition of parts of it be delayed. Perhaps some kind of "acquire on demand" capability is better suited here, and/or expansion of the workloads feature to enable bringing in SDK commands, locales, etc.

richlander commented 1 month ago

We can do it just for containers (to start). I don't think users will be confused. We already have layering in place for other scenarios and people have been able to grasp it. In fact, it has been quite successful. Also, as I say, I expect that this is a 90/10 thing. 90% of users won't even notice.

Related: https://github.com/dotnet/dotnet-docker/discussions/4821

I think catering the SDK for the persistent installation model is a mistake. We should make it super easy for people to install fewer bits for disposable environments.

DamianEdwards commented 1 month ago

I think catering the SDK for the persistent installation model is a mistake. We should make it super easy for people to install fewer bits for disposable environments.

I agree with this but of course trade offs still need to be made. Perhaps a good next step would be a straw man proposal of what to introduce that we can then evolve from, e.g. a new SDK container type, call it "SDK Slim", that contains only enough to build a .NET console app project. The following components would not be included:

Some of these might actually already support acquire on demand semantics, e.g. MSBuild SDKs via the NuGet SDK resolver, ref and runtime packs via core SDK functionality. Would be good to do some exploration here to see how close it already is.

richlander commented 1 month ago

On demand semantics are a good approach. The real question is where the min line is. Today, you can build ASP.NET Core apps (with all versions matching) w/o downloading parts of the platform (runtime or tooling). I think that's very good and most developer will want that to continue. It's also where we offer value compared to the node.js ecosystem.

Container layers naturally cache. Downloadable content can be cached, but it requires complicated opt-in patterns to do well. When I say cache, I mean that building multiple ASP.NET Core app images within the same CI leg should share as much content as possible.

In the ideal world, we'd offer significant functionality in the base SDK image and then show folks how to opt-in to more sharing on top of that. That's what is happening here: https://github.com/dotnet/dotnet-docker/blob/7d4d56941607d8521d500be152d66bb7d9e3dbf0/samples/releasesapi/Dockerfile.ubuntu-chiseled#L10-L12. I plan to expand that pattern to other samples where there is a benefit (anything self-contained).

After writing all that (and thinking a bit more), console + web api could be a good baseline for the min sdk. It's really a question of whether we can come up with good patterns for ensuring users can cache the content they want. This includes for users that want an air-gapped experience.

I've often wondered whether some users might benefit from a cached runtime pack in our images. We've never done it because it is too big. If we reduce the size of the SDK, we may be able to develop a pattern that makes caching runtime packs (while still being servicing friendly) more workable.