dotnet / sdk

Core functionality needed to create .NET Core projects, that is shared between Visual Studio and CLI
https://dot.net/core
MIT License
2.7k stars 1.06k forks source link

Localized resources in the CLI adds 53 MB on disk and increases time-to-main by ~50% #16196

Closed DamianEdwards closed 3 years ago

DamianEdwards commented 3 years ago

The CLI today includes resources for all languages for not only itself but all of its dependencies, including Roslyn, MSBuild, NuGet, and VSTest. This happens regardless of the locale of the machine the CLI is installed on.

The inclusion of all these resources results in:

It doesn't seem as though there's any benefit to including all language resources in every install (i.e. I don't personally get a benefit from all language resources other than English being installed).

The impact on time-to-main is particularly interesting with regards to the effort to improve the CLI performance for inner-loop scenarios in .NET 6 (see https://github.com/dotnet/msbuild/issues/5876), offering a saving to the floor for the CLI's time-to-main of at least 50% in local testing.

Here's the results of a manual analysis of the resources files in the CLI installation I found today (using version 6.0.100-preview.2.21118.12 installed on Windows via the nightly installer):

Path Size on disk (Bytes) KB MB
root 17,461,248 17,052 16.65
DotnetTools\dotnet-watch\6.0.100-preview.1.21118.15\tools\net6.0\any 483,328 472 0.46
Extensions 1,122,304 1,096 1.07
FSharp 5,742,592 5,608 5.48
Microsoft\Microsoft.NET.BuildExtensions\tools\net6.0 512,000 500 0.49
Roslyn 266,240 260 0.25
Roslyn\bincore 9,486,336 9,264 9.05
Sdks\Microsoft.NET.Sdk\analyzers 3,592,192 3,508 3.43
Sdks\Microsoft.NET.Sdk\codestyle\cs 1,245,184 1,216 1.19
Sdks\Microsoft.NET.Sdk\codestyle\vb 1,134,592 1,108 1.08
Sdks\Microsoft.NET.Sdk\tools\net6.0 512,000 500 0.49
Sdks\Microsoft.NET.Sdk\tools\net472 512,000 500 0.49
Sdks\Microsoft.NET.Sdk.Publish\tools\net6.0 921,600 900 0.88
Sdks\Microsoft.NET.Sdk.Publish\tools\net472 921,600 900 0.88
Sdks\Microsoft.NET.Sdk.Razor\tasks\net6.0 5,103,616 4,984 4.87
Sdks\Microsoft.NET.Sdk.WindowsDesktop\tools\netcoreapp2.1 593,920 580 0.57
Sdks\Microsoft.NET.Sdk.WindowsDesktop\tools\net472 593,920 580 0.57
Sdks\NuGet.Build.Tasks.Pack\CoreCLR 2,011,136 1,964 1.92
Sdks\NuGet.Build.Tasks.Pack\Desktop 2,011,136 1,964 1.92
TestHost 1,716,224 1,676 1.64
       
Resources Total 55,943,168 54,632 53.35
CLI Total 288,780,288 282,012 275.40
       
Resources as a portion of overall CLI size on disk 24%    

Note that we added support for selecting specific languages for satellite resources in 2.x: https://github.com/dotnet/sdk/issues/774

Additionally, in my local testing I noted that completely removing the CLI's deps.json file didn't seem to impact the operation of the CLI itself. The commands I tried all continued to work perfectly well when the deps.json file was deleted.

While putting this table together I noticed a lot of duplication. You can see a bunch of size values appear identically a number of times as different components bring in the same dependencies, which bring in their resources again. It might be useful for that to investigated separately in the interest of simplifying the layout and thus install time and disk cost of the CLI.

brianrob commented 3 years ago

I also wonder how much of this actually needs to be in the deps.json file - the less we put in there, the less actually gets touched at start-up. I like the idea of a size win too, but if we did need to ship the files, we definitely don't need to load them all at start-up in case we use one.

DamianEdwards commented 3 years ago

Yep, I actually tried no deps file at all, and it seemed to work just fine:

Additionally, in my local testing I noted that completely removing the CLI's deps.json file didn't seem to impact the operation of the CLI itself. The commands I tried all continued to work perfectly well when the deps.json file was deleted.

brianrob commented 3 years ago

Now that's a win. I want to chat with @vitek-karas and @agocke about this, as it does seem that there is some opportunity to reduce latency in the host, but we're going to need experts to help keep this on the rails. :)

wli3 commented 3 years ago

So if we remove dotnet.deps.json, the dll loading for satellite assemblies are not a problem? I remember in most of the case deps.json are no longer needed. SDK should not fall into these situations. We might be able to remove it. Ya, we need @vitek-karas

For the loc size. "I don't personally get a benefit from all language resources other than English being installed" is not the right angle, since we (as English users) are not the customer of the entire localization work. I do want PMs to discuss that. We could have 12 different SDK installers for different language users to download. But that would be a crazy download page. We might consider 1 with 12 loc and one without any loc (English only). But frankly, then, we only improved the English user's performance.

brianrob commented 3 years ago

I would divide the options here into two areas:

My personal opinion is that P1 is the start-up win, but there is also a significant acquisition time and size win if we can find a way to defer the download of satellite assemblies until we know that they are needed (or make them optional for the user to download). Many other products have a separate localization pack for this very reason - I wonder if we might consider this as well.

DamianEdwards commented 3 years ago

Also note that due to the architecture of the CLI many components that bring in these resources are there multiple times, so the cost isn't even paid just the once. Some components bring assets for multiple TFMs and the current packaging means they bring the same resources in each TFM dir too. I'm not sure which components actually need assets for multiple TFMs or what it would take to de-dupe the resources in cases where different runtime assets are indeed needed for different TFMs, but given the impact on size here it's likely worth auditing.

Ideally all CLI users would only need to acquire and pay the startup cost for the languages they desire (which was the original intent of my statement about me not personally getting a benefit from resources other than those for my platform language).

wli3 commented 3 years ago

dotnet.deps.json seems like an easy win. @vitek-karas can we just remove that?

For satellite assemblies. @DamianEdwards @KathleenDollard I hope PM can take the first action approaching this issue. We should define an acceptable experience first. On VS, standalone installer, zip/tarball with optional workload * offline scenario. Would we pause and wait for sdk to download 10 mb loc files on user start up? Or we could just ask user to unzip files to a certain location like intellisense loc. Or can we just don't loc zip/tarball since these are for CIs. But only keep standlone installer(I assume only CI is very sensitive to the download size? And if I am using a low bandwidth machine, I would rather wait to download installer once for 3 hours to get it over night than download 1.5 hour and then with for loc for another .5 hour on start up).

vitek-karas commented 3 years ago

It depends what's in it. If the CLI was built as a normal app with SDK I would not worry about it, but the CLI construction is not a standard process, so I don't know for sure that there's not something weird about the .deps.json. In short:

Then it should be safe to remove the .deps.json.

vitek-karas commented 3 years ago

While looking at another issue related to CLI - I noticed that CLI is currently shipped as a portable app. For example, we ship two copies of System.Security.Cryptography.Pkcs.dll - one which is Windows specific, and another one for everything else. This means deleting .deps.json would pick the wrong one on Windows (it would pick the "everything else" version, not the Windows version). I don't know how many other assemblies are like this, but this alone blocks us from removing .deps.json from CLI.

This raises another question though - why are we shipping the CLI as portable app? AFAIK we only ever ship target specific packs (zips or installers). I assume it makes our build times faster (since in theory we don't have to build that many versions), on the other hand it probably makes the output larger since we carry multiple copies of some assets. And how does this work with crossgen? If we crossgen anything in CLI it will automatically make it platform specific - so building as portable makes no sense in that case.

DamianEdwards commented 3 years ago

Great points @vitek-karas. It would seem the optimal goal would be for the CLI to be as targeted for the desired platform and locale as possible, and any step back from that should be justified as a trade-off against other concerns, e.g. build time/complexity, performance, acquisition experience, etc.

wli3 commented 3 years ago

I guess it is because dotnet/sdk produce a portable app.(people who set up it left the team) And then in dotnet/install, it is copied to a layout and crossgened.

At this point, I don't see any performance issue has a quick fix. Any build change could be costly. @marcpopMSFT to allocate resources.

DamianEdwards commented 3 years ago

Could we determine if removing the locale satellite assembly metadata from the deps.json still results in the runtime being able to load those assemblies when on the relevant locale OK? If so, it would seem fairly straightforward to have a build step for the CLI that just removes that metadata in order to realize the time-to-main improvements (assuming it can be reproduced).

marcpopMSFT commented 3 years ago

For the localized assemblies in the installer itself, we can potentially roll that into our planned installer updates in net 7 as we'd need a design that ensured customers could get the assemblies they needed when they needed them.

For the deps file, is it enough to remove every resources node in the dotnet.deps.json file? Would that have impact on customers who wanted localized output with dotnet.exe? If so, it might be possible to modify RemoveAssetFromDepsPackages to allow for a wildcard to remove everything in the section though we won't have time for a while to investigate what impact that would have.

DamianEdwards commented 3 years ago

Given the potential savings in time-to-main on this, I'm not willing to let it go just yet :smile:

Between @marcpopMSFT and @brianrob could we have someone try to replicate what I did locally but get some traces to try and properly verify that the savings are real?

vitek-karas commented 3 years ago

@mateoatr and I are working on a host change which would forego file probes for pretty much all the files in the .deps.json - maybe even the resource files (those are tricky though). While not as good as removing parts of the .deps.json completely, it should help a LOT. So maybe if we combine all of these together it will give us enough of a win for 6.

marcpopMSFT commented 3 years ago

FYI, I threw together a change that will remove the resources section of the dotnet.deps.json file. I don't think we have capacity to look at changing dotnet from a portable app this release or making the resources optionally install until .NET 7. Note that I did limited testing to see that dotnet.exe still worked and I could change languages. Not sure what other testing would need to be done for this but I can confirm the deps file is now much smaller. https://github.com/dotnet/sdk/compare/master...marcpopMSFT-removeResourcesFromDepsM

brianrob commented 3 years ago

@marcpopMSFT, I did a bit of testing on your proposed change. I ran each of these 10 times with the first run thrown out for warm-up:

Baseline Resources Removed Dotnet.deps.json Removed
Avg 172.64 149.16 154.70
Stdev 11.76 23.08 28.26
Min 152.98 124.33 127.42
Max 193.36 179.38 218.70

It looks like resources removed is the best option right now, and gives an average with of about 20ms. I was surprised that removing the deps file entirely was actually worse. @vitek-karas, do you think this is because the host is now probing for files?

vitek-karas commented 3 years ago

Without .deps.json we will probe all files in the application folder. This is probably another CLI specific case, where the root directory with the dotnet.dll contains many more assemblies than just what dotnet.dll needs (MSBuild, VSTest, ...). So I'm guessing that with .deps.json without resources we end up probing noticeably less files.

Actually the fact that the root folder contains other assemblies which do not belong to dotnet.dll is a good reason to use .deps.json. Not only for perf, but potentially for functionality. Another example of CLI being weird - I don't think SDK can produce two apps into the same output folder and "merge" their dependencies.

@brianrob Just curious what did you measure? Time-to-main, some CLI command, something else?

brianrob commented 3 years ago

Thanks @vitek-karas. That makes sense.

Sorry - good point - I forgot to include the command I measured. It is dotnet --version.

brianrob commented 3 years ago

In addition to dotnet.deps.json, there a bunch of other deps.json files as well:

 Directory of c:\Work\sdk\test\sdk\6.0.100-preview.2.21118.12

02/18/2021  02:59 PM            48,104 datacollector.deps.json
02/18/2021  07:18 PM           168,861 dotnet.deps.json
02/18/2021  07:18 PM            55,528 dotnet-watch.deps.json
02/18/2021  03:00 PM            16,559 Microsoft.TestPlatform.PlatformAbstractions.deps.json
02/18/2021  07:18 PM           168,863 MSBuild.deps.json
02/18/2021  07:18 PM           168,895 NuGet.CommandLine.XPlat.deps.json
02/18/2021  02:58 PM            68,057 package.deps.json
02/18/2021  02:58 PM            52,724 vstest.console.deps.json
               8 File(s)        747,591 bytes

 Directory of c:\Work\sdk\test\sdk\6.0.100-preview.2.21118.12\DotnetTools\dotnet-dev-certs\6.0.0-preview.2.21117.17\tools\net6.0\any

02/18/2021  06:01 AM             8,885 dotnet-dev-certs.deps.json
               1 File(s)          8,885 bytes

 Directory of c:\Work\sdk\test\sdk\6.0.100-preview.2.21118.12\DotnetTools\dotnet-user-secrets\6.0.0-preview.2.21117.17\tools\net6.0\any

02/18/2021  06:01 AM            18,816 dotnet-user-secrets.deps.json
               1 File(s)         18,816 bytes

 Directory of c:\Work\sdk\test\sdk\6.0.100-preview.2.21118.12\DotnetTools\dotnet-watch\6.0.100-preview.1.21118.15\tools\net6.0\any

02/18/2021  07:18 PM            55,528 dotnet-watch.deps.json
               1 File(s)         55,528 bytes

 Directory of c:\Work\sdk\test\sdk\6.0.100-preview.2.21118.12\FSharp

02/18/2021  07:18 PM            77,056 fsc.deps.json
02/18/2021  07:18 PM            77,056 fsi.deps.json
               2 File(s)        154,112 bytes

 Directory of c:\Work\sdk\test\sdk\6.0.100-preview.2.21118.12\Roslyn\bincore

02/18/2021  05:04 AM            18,450 csc.deps.json
02/18/2021  05:04 AM            18,535 vbc.deps.json
02/18/2021  05:04 AM            20,451 VBCSCompiler.deps.json
               3 File(s)         57,436 bytes

 Directory of c:\Work\sdk\test\sdk\6.0.100-preview.2.21118.12\Sdks\Microsoft.NET.ILLink.Tasks\tools\net5.0

02/18/2021  05:27 PM             7,023 illink.deps.json
02/18/2021  05:27 PM            24,720 ILLink.Tasks.deps.json
               2 File(s)         31,743 bytes

 Directory of c:\Work\sdk\test\sdk\6.0.100-preview.2.21118.12\Sdks\Microsoft.NET.Sdk.BlazorWebAssembly\tools\net6.0

02/18/2021  07:18 PM             6,824 Microsoft.NET.Sdk.BlazorWebAssembly.Tool.deps.json
               1 File(s)          6,824 bytes

 Directory of c:\Work\sdk\test\sdk\6.0.100-preview.2.21118.12\Sdks\Microsoft.NET.Sdk.Razor\source-generators

02/18/2021  07:18 PM            18,396 Microsoft.NET.Sdk.Razor.SourceGenerators.deps.json
               1 File(s)         18,396 bytes

 Directory of c:\Work\sdk\test\sdk\6.0.100-preview.2.21118.12\Sdks\Microsoft.NET.Sdk.Razor\tools

02/18/2021  07:18 PM            14,561 rzc.deps.json
               1 File(s)         14,561 bytes

We probably don't need to trim them all, but I wonder if we might get benefit from trimming some, such as:

Basically, any time we're going to start a new process that consumes one of these, we should consider trimming resources from the deps file.

marcpopMSFT commented 3 years ago

I merged the PR to remove the resources node from the dotnet.deps.json file only per Brian's analysis that is would save 20ms. We don't have a lot of meaningful test coverage in this area so this went into main after the preview 3 cutoff (so it has some bake time). I'll leave this issue open as there is further discussion about removing from other deps files as well.

brianrob commented 3 years ago

Thanks @marcpopMSFT. Yes, let's keep this one open for now. I think we likely want to modify some of the other deps.json files here - especially those for inner-loop, such as rzc.deps.json and dotnet-watch.deps.json at a minimum.

marcpopMSFT commented 3 years ago

Tracking as part of the optimize SDK size .net 7 planning document.