dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.03k stars 4.68k forks source link

Help requested: How to remove ReadyToRun native code from DLL? (for app size trimming) #56699

Closed rickbrew closed 2 years ago

rickbrew commented 3 years ago

I've ported Paint.NET to .NET 5 and am in the process of finalizing things for a public/stable release. One of the non-critical issues I'm always chipping away at is minimizing the download size.

For framework-dependent builds, this partly involves being selective about with of my app's DLLs are precompiled using crossgen (I'll be migrating to crossgen2 later). Saving 100KB here and there can be very fruitful! DLLs not used on the startup path are not crossgen'd, for the most part.

For the self-contained builds, size is not as important because these are portable ZIPs that are not downloaded as much. All DLLs are crossgen'd.

However, I recently discovered that framework-dependent deployment on ARM64 is essentially broken because, 1) you can install both ARM64 and x64 runtimes, and 2) they install to the same locations (file system, registry), and 3) things just don't work when this happens. See also https://github.com/dotnet/installer/issues/10192 ... Ergo, my ARM64 installer must use SCD, greatly increasing its download size (~12MB --> ~80MB).

So I'd like to be able to trim the size of my ARM64 download by removing the ReadyToRun data from certain framework DLLs that are not perf-critical for my app's startup. PresentationFramework.dll is at the top of the list; PDN mostly uses WindowsBase.dll, and I think it references a few types from PF.dll, but not enough to warrant the additional size of the R2R native code. Other DLLs in my crosshairs are System.Private.Xml.dll, System.Linq.Expressions.dll, and System.Windows.Forms.Design.dll.

Is there a way to strip out the R2R data from a DLL? Even if it involves PEBuilder or something, I'm fine with something that's low-level and not just a simple command-line or powershell command. It's obviously possible in principle -- if it can be added, it can be removed, eh?

ghost commented 3 years ago

Tagging subscribers to this area: @vitek-karas, @agocke, @vsadov See info in area-owners.md if you want to be subscribed.

Issue Details
I've ported Paint.NET to .NET 5 and am in the process of finalizing things for a public/stable release. One of the non-critical issues I'm always chipping away at is minimizing the download size. For framework-dependent builds, this partly involves being selective about with of my app's DLLs are precompiled using `crossgen` (I'll be migrating to `crossgen2` later). Saving 100KB here and there can be very fruitful! DLLs not used on the startup path are not crossgen'd, for the most part. For the self-contained builds, size is not as important because these are portable ZIPs that are not downloaded as much. All DLLs are crossgen'd. However, I recently discovered that framework-dependent deployment on ARM64 is essentially broken because, 1) you can install both ARM64 and x64 runtimes, and 2) they install to the same locations (file system, registry), and 3) things just don't work when this happens. See also https://github.com/dotnet/installer/issues/10192 ... Ergo, my ARM64 installer must use SCD, greatly increasing its download size (~12MB --> ~80MB). So I'd like to be able to trim the size of my ARM64 download by removing the ReadyToRun data from certain framework DLLs that are not perf-critical for my app's startup. `PresentationFramework.dll` is at the top of the list; PDN mostly uses `WindowsBase.dll`, and I think it references a few types from PF.dll, but not enough to warrant the additional size of the R2R native code. Other DLLs in my crosshairs are `System.Private.Xml.dll`, `System.Linq.Expressions.dll`, and `System.Windows.Forms.Design.dll`. Is there a way to strip out the R2R data from a DLL? Even if it involves `PEBuilder` or something, I'm fine with something that's low-level and not just a simple command-line or powershell command. It's obviously possible in principle -- if it can be added, it can be removed, eh?
Author: rickbrew
Assignees: -
Labels: `area-AssemblyLoader-coreclr`, `untriaged`
Milestone: -
MichalStrehovsky commented 3 years ago

Roundtripping the assembly through ILDASM/ILASM should achieve that. Make sure to include the resources that ILDASM dumps next to the IL file.

rickbrew commented 3 years ago

@MichalStrehovsky okay that seems to work, although the /resource parameter was removed from ILASM. How would I go about including the resource file?

rickbrew commented 3 years ago

To be more specific, the .NET 4.8 version of ilasm gives an error on some NaN values. It just doesn't work. And then the .NET 5 version of ilasm has no trouble with that, but no longer includes /resource for some reason.

agocke commented 3 years ago

@JulieLeeMSFT for ilasm/dasm questions

JulieLeeMSFT commented 3 years ago

@briansull please take a look.

rickbrew commented 3 years ago

Also, it's okay if I have to use some C# code to embed the resource (Mono.Cecil? PEBuilder?). I'm fine with a solution that's a mix of command-line and code, I'm very well set up for that.

rickbrew commented 3 years ago

Related are https://github.com/dotnet/runtime/issues/48046 and https://github.com/dotnet/runtime/issues/11412

rickbrew commented 3 years ago

Also, PresentationFramework.dll on ARM64 dropped from 18,531 KB to 5,924 KB, so that's a huge win -- just need to be able to embed the resources!

rickbrew commented 3 years ago

@sylveon came up with a solution: Resource Hacker (freeware http://www.angusj.com/resourcehacker/) can do this at the command-line, e.g.

resourcehacker.exe -open old.exe -action addoverwrite -resource resources.res -save new.exe

So I can ildasm to get the .il and .res, then ilasm to reconstitute the DLL w/o R2R and w/o native resources, then ResourceHacker.exe to stamp the resources in.

It would still be nice if ilasm could properly roundtrip from ildasm so that external tools weren't needed. It's unfortunate that /resource was removed.

MichalStrehovsky commented 3 years ago

Looks like there was a regression in .NET ILASM around NaN handling in .NET 4.8: #37210. I don't know how that went.

Desktop ILASM won't be able to assemble some of the new things we added in .NET Core, such as default interface methods. So using .NET 5 ILASM should be the way forward.

I was just about to write about ResHacker. The other option is to use BeginUpdateResource Win32 API. One just needs to iterate the resources in the RES file (the file format is easy and documented) and call UpdateResource on each.

sylveon commented 3 years ago

I am not 100% sure of this, but I think you can also use cvtres to convert the .res to an .obj file, then use LoadLibraryEx and LOAD_LIBRARY_AS_DATAFILE to load the .obj file, then use normal Win32 APIs to enumerate all the resources in the .obj and then use UpdateResource to apply them to the .exe without needing to write some code directly interacting with the binary resource format.

But then, Resource Hacker most likely uses UpdateResource already, and is a tool that was battle-tested over 20 years, so I don't think it's worth the effort.

briansull commented 3 years ago

The reason that /resource (a Windows Only feature) isn't available in the CoreCLR runtime is explained here

https://github.com/dotnet/runtime/issues/11412#issue-557922660

And has this open issue: *Enable embedding .res resources via ilasm #11412**

@jkoritzinsky may want to comment on this

rickbrew commented 3 years ago

Well, unfortunately the DLL does not work after ildasm -> ilasm + resource_hacker. It complains about it being in an "invalid format." Something is still clearly missing from the DLL. This may just be beyond me at this point.

MichalStrehovsky commented 3 years ago

Can you check the headers are the same (maybe with a tool like CFF Explorer or something like that)? E.g. the assembly is not x86 when it should be x64, for example.

AraHaan commented 3 years ago

Another note that Resource Hacker is not the only option and does not know about managed resource streams as well. Many applications use these and as such certain code may start failing if the actual resource data from that is not included as well.

You can dump this Information from ILSpy though.

rickbrew commented 3 years ago

@MichalStrehovsky Well I was unable to make heads or tales out of what CFF Explorer said, and this whole "remove R2R data" idea was just spinning around in circles, so I decided to stop digging. There were some things missing in the EXE but I don't understand what they are, nor what to do about them, and it (maybe) would've taken forever to figure it all out.

I did find an alternate path towards achieving my goal: building WPF myself. It took a bit of head scratching and cursing but I was able to build it for all 3 arches (x64, x86, arm64), and without the placeholder version stamp (42.42.42.42424) (otherwise it wouldn't load).

As it turns out, this it the "min cut" I needed to greatly reduce the size of the ARM64 SCD installer from ~80MB down to only 45MB! PresentationFramework.dll, which is the only DLL I'm using from my custom build, is only 11MB smaller. However, makensis.exe (a 32-bit x86 EXE 🙄) is limited to a 128MB dictionary for its LZMA compression and this seems to pull things across a threshold whereby it can probably do its job (presumably recognizing duplicate blocks on opposite ends of the archive or something).

I still need to do some perf testing to ensure this doesn't greatly regress app startup time, but it's very promising. I may later try my hand at building WinForms and dotnet itself to see what other opportunities there are for trimming my download size. For now, this is a big win for my Paint.NET v4.3 distribution plans.

cc @EgorBo who expressed some interest on Discord in writing an uncrossgen tool. This would still be very useful to greatly simplify my packaging process so that I don't need to build WPF myself every month (for each servicing release).

(For anyone else who would need to know about using a "real" version stamp instead of 42.42.42424 when building WPF: open up ./eng/Versions.props and add <OfficialBuild>true</OfficialBuild> to one of the <PropertyGroup>s. This was tricky to figure out, and I just guessed about it from some unrelated discussion I found somewhere else, and it worked. Unless you get e.g. 5.0.9., it will refuse to load your DLL if everything else is trying to bind to 5.0.9. instead of 42.42.42.42424.)

vitek-karas commented 3 years ago

/cc @trylek for potential ideas from the crossgen territory

jkotas commented 3 years ago

uncrossgen tool

IL rewriting tools will strip the R2R payload as side-effect of IL rewriting. You can try running the .dll through IL linker and tell it to keep it everything. Or run the .dll through a no-op Fody rewriter.

trylek commented 3 years ago

I recall the idea of "uncrossgenning" was floated around in the past. In fact I think Crossgen2 can be made to carry out most of it by removing all R2R code by using some weird option combo; the header will however likely remain. I think that it should be trivial to add a new command-line option, something like --strip, that would just take the input and copy over the filtered MSIL to the output. I'm however not sure whether the approaching date of RC1 fork is a good time to invent new functionality so unless it's sorely blocking some scenarios I would be inclined to postpone this to .NET 7.

rickbrew commented 3 years ago

I'm fine using a non-mainline version of crossgen2, even if it's prerelease .NET 7 or something I have to patch 'n build myself from sources. I'm only publishing my app for the latest public stable .NET, but I'm not as picky about my build-time toolchain.

0xC0000054 commented 3 years ago

I would also find an uncrossgen tool useful.

While looking for information on the ReadyToRun "Composite" mode that @EgorBo mentioned on Discord I stumbled across the ReadyToRun File Format documentation. Based on that page I suspect that you may need to change some of the CLI Header data in the ILDASM output before recompiling it. For example, updating the Header Flags to replace COMIMAGE_FLAGS_IL_LIBRARY with COMIMAGE_FLAGS_IL_ONLY.

I would test this, but I am still trying to figure out how to get the .NET 5 version of IL(D)ASM. It does not look like those items are included in the SDK.

rickbrew commented 3 years ago

@0xC0000054 the nuget package for crossgen2 is https://www.nuget.org/packages/Microsoft.NETCore.App.Crossgen2.win-x64/

Many of these nuget packages can't be added to a regular .csproj, so I found a workaround -- unfortunately it errors out, so I have to pipe the output of nuget to nul, which means I can't catch errors.

This is the batch file I use for snagging packages like this in my build process, https://gist.github.com/rickbrew/7e54fe106a2a5201eb603a7e6d961c4c . Imperfect but it's been working knock on wood

I tried using Mono.Cecil to read and then write out the assembly, and even after setting the flags to ILOnly, it still fails to load. It's the right size though so it's definitely doing something right (6MB instead of 17MB for PresentationFramework.dll).

AraHaan commented 3 years ago

@rickbrew cant you compare between the one you compiled and the one that you tried to strip the R2R to try to determine what actually can make the stripped one work as well?

rickbrew commented 2 years ago

Okay now that I've migrated to .NET 6 and crossgen2, I found the option @trylek was thinking of: --compile-no-methods

With a simple command line such as,

crossgen2 --targetos:windows --targetarch:x64 --compile-no-methods -r:System.Private.CoreLib.dll -r:System.Runtime.dll --out:der2r\PresentationFramework.dll PresentationFramework.dll

... it creates a DLL that has the R2R bits removed from it, and the app loads up and works fine! I'll need to do further performance testing to figure out exactly which DLLs I can get away with, but there are a bunch that seem obvious for my purposes (PresentationFramework, System.Linq.Expressions, System.Data.Common, etc.).