Closed Sergio0694 closed 1 year ago
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.
Author: | Sergio0694 |
---|---|
Assignees: | - |
Labels: | `untriaged`, `area-NativeAOT-coreclr` |
Milestone: | - |
DebuggerSupport
is enabled by default that in turn brings in a lot of reflectable cruft, for example: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.Linq/src/ILLink/ILLink.Descriptors.xml#L2-L7 You can shave 350kB by adding <DebuggerSupport>false</DebuggerSupport>
to your sample. Should we set DebuggerSupport
to false by default for NativeAOT publish? Blazor has prior art
https://github.com/dotnet/sdk/blob/86117ad328025825f25f88389077d9eee2dc9031/src/BlazorWasmSdk/Targets/Microsoft.NET.Sdk.BlazorWebAssembly.6_0.targets#L53.() Reflectable method: [CommunityToolkit.Diagnostics]CommunityToolkit.Diagnostics.TypeExtensions+<>c.<ToTypeString>b__2_2(Type)
(Target of a delegate) __DelegateCtor_S_P_CoreLib_System_Delegate__InitializeClosedInstance__CommunityToolkit_Diagnostics_CommunityToolkit_Diagnostics_TypeExtensions___c___ToTypeString_b__2_2
(ldftn) CommunityToolkit_Diagnostics_CommunityToolkit_Diagnostics_TypeExtensions___ToTypeString_g__FormatDisplayString_2_0
(call) CommunityToolkit_Diagnostics_CommunityToolkit_Diagnostics_TypeExtensions___c___ToTypeString_b__2_1
(reloc) __DelegateCtor_S_P_CoreLib_System_Delegate__InitializeClosedInstance__CommunityToolkit_Diagnostics_CommunityToolkit_Diagnostics_TypeExtensions___c___ToTypeString_b__2_1
(ldftn) CommunityToolkit_Diagnostics_CommunityToolkit_Diagnostics_TypeExtensions__ToTypeString
(call) CommunityToolkit_Diagnostics_CommunityToolkit_Diagnostics_Guard_ThrowHelper__ThrowArgumentNullExceptionForIsNotNull<System___Canon>
(call) CommunityToolkit_Diagnostics_CommunityToolkit_Diagnostics_Guard__IsNotNull<System___Canon>
(call) ComputeSharp_ComputeSharp_GraphicsDeviceExtensions__ForEach_0<computesharp_cli_ComputeSharp_SwapChain_Shaders_Octagrams__ComputeSharp_Core_ComputeSharp_Float4>
(call) computesharp_cli_ComputeSharp_SwapChain_Backend_SwapChainApplication_1<computesharp_cli_ComputeSharp_SwapChain_Shaders_Octagrams>__OnUpdate
(Virtual method) (??_7computesharp_cli_ComputeSharp_SwapChain_Backend_SwapChainApplication_1<computesharp_cli_ComputeSharp_SwapChain_Shaders_Octagrams>@@6B@ constructed, VirtualMethodUse [computesharp.cli]ComputeSharp.SwapChain.Backend.Win32Application.OnUpdate(TimeSpan))
(Primary) ??_7computesharp_cli_ComputeSharp_SwapChain_Backend_SwapChainApplication_1<computesharp_cli_ComputeSharp_SwapChain_Shaders_Octagrams>@@6B@ constructed
(newobj) computesharp_cli_Program___Main__
Ooh that's very interesting, thank you! π
"You can shave 350kB by adding
<DebuggerSupport>false</DebuggerSupport>
to your sample."
TIL, I can certainly do that! FWIW it does make sense to me for it to be on by default in published builds.
"Delegate targets are reflectable by default. It means that delegates are more expensive than in reflection-free mode. Do we want to have a way to disable delegate reflectability?"
Of course I can't comment on how complex this would be to enable, but having an option for that (assuming the linker can't just automatically strip that support if it seens that you're never reflecting on delegates) certainly sounds nice. Do you also have an idea of the amount of size (like, ballpark) such an option could save in an example like this? π€
"The sample uses Linq that amplifies impact of the previous two points. You may want to get rid of Linq from the parts of Community Toolking that are meant to be lean. Here is example of the dependency chain from whydgml tool:"
Oh that completely slipped my mind - those are in a slow path (that code is only used to format a nice exception message if a throw helper fails), but of course that'd still cause more stuff to be included in all cases, right. I can definitely remove that and see how much it helps, and then I guess I could also just drop that package reference entirely here and just use some internal throw helpers - since this is a library and not an app I guess having one less dependency isn't a bad idea anyway.
Will share updated numbers here when I have time to make the changes and re-run the tests π
Do you also have an idea of the amount of size (like, ballpark) such an option could save in an example like this?
I have seen it on number of paths. It is hard to guess the absolute number by looking at the dependency logs. The easiest way to find out the absolute number would be to build the compiler with the delegate reflectability commented out and check the difference.
I also noticed some size regression on simple Hello World app when using following settings:
<PropertyGroup>
<PublishAot>true</PublishAot>
<DebuggerSupport>false</DebuggerSupport>
<InvariantGlobalization>true</InvariantGlobalization>
<UseSystemResourceKeys>true</UseSystemResourceKeys>
<IlcOptimizationPreference>Size</IlcOptimizationPreference>
<IlcFoldIdenticalMethodBodies>true</IlcFoldIdenticalMethodBodies>
<IlcDisableReflection>true</IlcDisableReflection>
<IlcGenerateCompleteTypeMetadata>false</IlcGenerateCompleteTypeMetadata>
<IlcGenerateStackTraceData>false</IlcGenerateStackTraceData>
</PropertyGroup>
.NET 7 size: 968 KB .NET 8 size: 1095 KB
On the "disable delegate reflectability" topic - would such an option also require adding some annotations to make it safe? Eg. I assume Delegate.Method
would have to be marked as nullable, as that metadata info would could then be removed? π€
I assume Delegate.Method would have to be marked as nullable
That would be unacceptable breaking change. Delegate.Method
would throw exception with this option. To make it "safe", the compiler can produce warnings/errors for places that call it.
Or we can build it as an optimization that kicks in automatically: disable the delegate target reflectability when the program does not contain any Delegate.Method
calls.
Alright I made a PR to drop the CommunityToolkit.Diagnostics dependency entirely and re-run the NAOT publishing π
I got the following (with reflection enabled):
And if I also set <DebuggerSupport>false</DebuggerSupport>
:
And for reference, this is the result if I leave reflection free mode set in that branch:
I'm very confused as to why disabling debugger support only saves another 52 KB on this branch π€
Anyway with the changes from that branch it seems the delta between reflection free is down to 591 KB. That's still seems considerable given in this case it's about a 32% size increase.
@jkotas let me know if you find anything else that you find interesting from the new map file from that branch, curious if that might show some other offenders that might've been less noticeable before. Also more than happy to help test some new compiler flag for that delegate reflection mode you mentioned, in case anyone gets to implement that π
And thank you for spotting that issue with those Toolkit APIs, I had no idea they were having a noticeable impact here.
I'm very confused as to why disabling debugger support only saves another 52 KB on this branch
DebuggerSupport impact was amplified by CommunityToolkit.Diagnostics
dependency and Linq. Now that these are gone, the impact is less.
Right, that makes sense. I guess my surprise was due to how that one use case and dependency was contributing so much compared to all the rest of the code involved, which is still a very considerable amount (both in ComputeSharp and the TerraFX.Interop.* dependencies in particular). I mean I found it very interesting give that was literally just a single method ending up rooting all that extra fluff, is all π
"
Delegate.Method
would throw exception with this option."
We could throw a MissingMetadataException
for old times' sake π
/s
Run some more tests with the options mentioned in https://github.com/dotnet/runtime/issues/82607#issuecomment-1444625047 as well (thanks @KeterSCP!):
<IlcFoldIdenticalMethodBodies>
set to false
: 1.780 KB (saved 70 KB)
<IlcGenerateCompleteTypeMetadata>
set to false
: same, I think it's included by reflection free already?
<IlcGenerateCompleteTypeMetadata>
, with reflection enabled: 2.441 KB, so same as previous test
I'll take the 70 KB of extra improvement with <IlcFoldIdenticalMethodBodies>
π
Current deltas with updated settings for both scenarios:
And looking forwards to seeing that decrease even more in the future π
<IlcFoldIdenticalMethodBodies>
Just curious, why is this not enabled by default, does it have any downsides?
It breaks Delegate.Method
and it makes stacktraces - both under debugger and at runtime (e.g. in Exception logs) - look non-sensical.
<IlcGenerateCompleteTypeMetadata>
set to false: same,
IlcGenerateCompleteTypeMetadata
is false by default everywhere. It is a troubleshooting aid documented at: https://github.com/dotnet/runtime/blob/main/src/coreclr/nativeaot/docs/optimizing.md#options-related-to-trimming
I see, thank you! π
Some more additional info: I double checked with .NET 7 as well following up on what @KeterSCP said, and noticed I also had a slight size regression in .NET 8, though only in reflection free mode. Here's a full table of results in case it's useful:
TFM/SDK | Reflection | Size |
---|---|---|
.NET 7 | enabled | 2.775 KB |
.NET 8 | enabled | 2.327 KB |
.NET 7 | disabled | 1.714 KB |
.NET 8 | disabled | 1.780 KB |
So in each category (reflection enabled/disabled):
Surprising that .NET 8 both improved (quite a bit, that's nice!) with reflection enabled, yet it does worse with it disabled.
For context, these are the settings I've used to produce all numbers in this table:
<DebuggerSupport>false</DebuggerSupport>
<UseSystemResourceKeys>true</UseSystemResourceKeys>
<InvariantGlobalization>true</InvariantGlobalization>
<IlcFoldIdenticalMethodBodies>true</IlcFoldIdenticalMethodBodies>
<IlcGenerateStackTraceData>false</IlcGenerateStackTraceData>
<IlcOptimizationPreference>Speed</IlcOptimizationPreference>
Of course, with <IlcDisableReflection>
being added on top of these in some cases.
size regression in .NET 8 only in reflection free mode
We do not pay a lot of attention to reflection free mode. I remember a few changes that regressed the reflection free mode. It is actually becoming a pain to keep the reflection free mode working as we work on improvement of the default mode.
.NET 8 enabled 2.327 KB .NET 8 disabled 1.780 KB
Breakdown of this delta:
A lot of the reflection engine code is brought in by initialization of the reflection engine during startup - we need to refactor the reflection engine to make this mandatory piece much smaller. Additional parts of the reflection engine code are brought in through various places in the libraries. For example, TerraFX depends on reflection through this:
(Secondary) VirtualMethodUse [S.P.CoreLib]System.Reflection.CustomAttributeData.get_AttributeType()
(callvirt) S_P_CoreLib_System_Runtime_InteropServices_NativeLibrary__GetDllImportSearchPath
(call) S_P_CoreLib_System_Runtime_InteropServices_NativeLibrary__LoadLibraryByName
(call) S_P_CoreLib_System_Runtime_InteropServices_NativeLibrary__Load_0
(call) TerraFX_Interop_Windows_TerraFX_Interop_Windows_Windows__OnDllImport
(reloc) __DelegateCtor_S_P_CoreLib_System_Delegate__InitializeOpenStaticThunk__TerraFX_Interop_Windows_TerraFX_Interop_Windows_Windows__OnDllImport__S_P_CoreLib_System_Runtime_InteropServices_DllImportResolver__InvokeOpenStaticThunk
(ldftn) TerraFX_Interop_Windows_TerraFX_Interop_Windows_Windows___cctor
(This is output of whydgml tool.)
Ooh very interesting, thank you for taking another look! π
Just so I understand then, potential areas for improvements are:
System.Private.Reflection.*
on its own (for those 420 KB)cc. @tannergooding about that NativeLibrary
support rooting reflection stuff. It seems unfortunate that that's always brought in (I assume it's because that hoot is set in the static constructor?), even if the consuming application isn't using that feature at all (eg. my CLI sample isn't using that). Is there a way to make that hook be pay for play in TerraFX? π€
Or alternatively, is this something that ILLink could potentially improve and be able to trim on its own?
"(This is output of whydgml tool.)"
I'm not familiar with that tool, but @SingleAccretion kindly pointed me towards some docs for it (he mentioned debugging-aot-compilers.md). How should that be used to identify callsites that are rooting reflection stuff in an application, do I just run that and look for any [S.P.CoreLib]System.Reflection.*
bits in any of those call chains?
There's a few different options.
The tooling should likely understand the hook and treat it as unused if its statically resolved all native imports or if they want to indicate they won't provide a hook/etc.
But, TerraFX can itself expose some static readonly bool
switch and ComputeSharp can tell NAOT how to treat it in the interim (this is how things like InvariantGlobalization
work behind the scenes: https://github.com/dotnet/runtime/blob/main/docs/workflow/trimming/feature-switches.md and https://github.com/dotnet/runtime/blob/main/src/coreclr/nativeaot/docs/optimizing.md).
There's a few different options.
Can you move the registration of the hook out of the static constructor and do it lazily only once somebody subscribes to the ResolveLibrary
event handler?
What are the scenarios that the ResolveLibrary
event handler is for?
The tooling should likely understand the hook
There are many rarely used APIs that the tooling can understand to trim some extra code. I do not think we will ever want to complicate the tooling for that. Libraries should avoid calling these APIs when do not need to call them in the first place.
"Can you move the registration of the hook out of the static constructor and do it lazily only once somebody subscribes to the
ResolveLibrary
event handler?"
That seems like a very good idea definitely worth trying out π
I assume you mean something like this?
private static event DllImportResolver? _resolveLibrary;
public static event DllImportResolver? ResolveLibrary
{
add
{
SetupFirstTimeHookIfNeededWithALock();
_resolveLibrary += value;
}
remove => _resolveLibrary -= value;
}
"What are the scenarios that the
ResolveLibrary
event handler is for?"
It can be used by consumerss to properly resolve dependencies they're bundling with their applications (eg. for redistributable libraries such as dxcompiler.dll
and dxil.dll
) which might not always be right next to the executable, and would otherwise crash. Eg. I'm using this in ComputeSharp.Dynamic (see here) (which supports runtime shader compilation and as such needs those two DLL-s I just mentioned) to ensure they're properly resolved in all possible scenarios (self-contained, local build, Debug/Release, through NuGet, etc.).
What are the scenarios that the ResolveLibrary event handler is for?
Native library resolution, particularly for cross platform code or Linux in general, can be particularly tricky.
Consider vulkan
where on Windows
it is vulkan.dll
or vulkan-1.dll
, depending. Or on Linux where it can be libvulkan.so.1
vs libvulkan.so
vs libvulkan.so.1.3.239
, vs ...
This allows app authors to customize the resolution logic to ensure that they can succeed even when running on a setup where the default resolution logic might fail. -- It's also required so that we can have one set of bindings and still resolve on a platform where the default search names don't always work.
Can you move the registration of the hook out of the static constructor and do it lazily only once somebody subscribes to the ResolveLibrary event handler?
Yes, but that's also slightly problematic. The way SetDllImportResolver
works is that you get exactly one hook per Assembly
. Once it's been set, you cannot remove the hook and other people cannot provide their own hooks/fallbacks for that assembly.
TerraFX effectively "has" to set it up front because of this, as if it didn't there wouldn't be a single location for everyone to provide extended resolution logic through and its own per platform resolution logic would also not light up (this is less important for TerraFX.Interop.Windows, but more important for other interop bindings provided).
SetDllImportResolver works is that you get exactly one hook per Assembly
Yes, it was meant for scenarios where the library itself wants to control where its dependencies are resolved from.
AssemblyLoadContext.ResolvingUnmanagedDll
is more appropriate for situations where the default resolution logic does not work for the app and the apps wants to augment the logic.
Yes, but that's also slightly problematic. The way SetDllImportResolver works is that you get exactly one hook per Assembly. Once it's been set, you cannot remove the hook and other people cannot provide their own hooks/fallbacks for that assembly.
If somebody wants to break this, they can still get their hook registered before the TerraFx one. I do not think you need to be protecting against people registering the hook for your library when they are not supposed to.
AssemblyLoadContext.ResolvingUnmanagedDll is more appropriate for situations where the default resolution logic does not work for the app and the apps wants to augment the logic.
Right. But the general issue is that the library is most often not the determiner here. It's often a factor of the consuming application and environment instead and there are many scenarios where an application may itself need to tweak or customize resolution logic, particularly when considering plugins or other scenarios.
Even for Windows, you can scenarios like d3dcompiler_*.dll
where *
is the number of the "latest" release (and with Windows shipping _33
through _43
and _47
today).
There is a general purpose alternative via AppContext
switches. This is how all of the S.P.Corelib and other trimming options end up working. Such a setup then works for "any" of the interop binding libraries and is opt-in by the application, not just the ones that by default don't have resolution customization.
I'd personally prefer that to be used as the approach so that my own libraries can be self-consistent with eachother.
There is a general purpose alternative via AppContext switches. This is how all of the S.P.Corelib and other trimming options end up working.
In general, these AppContext switches are workarounds for legacy designs - to avoid breaking 20 years old APIs. If we were to do this again, we would design the APIs such that they are pay-for-play by design and these AppContext switches are not necessary for pay-for-play.
Everyone (eventually) has "legacy design" and everyone (eventually) has functionality that they would like to make opt-in or pay for play so they are being used for pay for play features regardless.
The JIT handling static readonly bool
as constant via rejit and AppContext
switches being trivially settable via runtimeconfig
makes this a very simple and fairly natural pattern.
If not AppContext
switches then there needs to be an alternative for developers to achieve the same general effect that way certain code can be trimmed via opt-in. This ranges from loggic, to additional validation features, to features that may be less frequently used but which exist to handle known edge cases that consumers do hit, but which may be unused in the scenario a particular consumer is in.
People can certainly use the appcontext switches just like CoreLib. I am just saying that it is the less desirable option with worse experience, and that the designs that do not require these should be preferred.
For me, the currently suggested alternative is not an available design because it makes the general experience of TerraFX.Interop.Windows
inconsistent with the other TerraFX.Interop.*
libraries I provide.
Anything done for TerraFX.Interop.Windows
needs to also be a valid option to achieve the same general desired outcome with TerraFX.Interop.Vulkan
or TerraFX.Interop.Xlib
or TerraFX.Interop.*
. -- In this case, the desired outcome is the trimming of the SetDllImportResolver
call.
If there is another alternative that would work equally for all the required scenarios, so the general end-user experience can be consistent, then I am happy to hear it.
I do not understand. Where is the inconsistency with registering SetDllImportResolver
late only once the app wants to customize the .dll loading? Note that the customization of .dll loading always brings in some reflection - there is no way around it.
The root issue here is that a consumer of TerraFX.Interop.*
is in a scenario where they do not need to customize the DllImport
resolution logic and therefore would like to see it trimmed.
Switching it to be "lazy" only when a user first does ResolveLibrary += Handler
works for setups like TerraFX.Interop.Windows
where there is not currently any per platform configuration required. It does not then also work for a setup like TerraFX.Interop.Vulkan
where there exists a default handler to catch the various names which "might" exist. It also then "breaks" if Windows ends up changing to needing some default handler.
Thus a consumer of TerraFX.Interop.Vulkan
is effectively stuck with the inability to trim said support, even if they were shipping vulkan
SxS with the app and ensuring that it matched the default lookup name.
This provides an inconsistent experience between the various TerraFX.Interop.*
libraries and if such support were added to Vulkan
in the future, there would now be two ways to get the functionality for TerraFX.Interop.Windows
.
I would not describe this as an inconsistent experience.
TerraFX.Interop.Windows
does less by default, so it can bring in less by default (it does not today). No AppContext switches should be needed for that.
TerraFX.Interop.Vulkan
does more by default, so it has to bring in more by default. If it is important for the users of your library to avoid this cruft where possible while maintaining the current behavior, you need to have opt-in mechanisms (AppContext switch).
I would not describe this as an inconsistent experience.
I think we'll just have to disagree on this point.
To me, the general desired experience exists for both and Interop.Windows
needing to do more today is an implementation detail that most users will not care about. They will simply see that consuming the two libraries has subtle differences in how things work and what they must do to achieve the same general desired goal.
If an AppContext
switch is the only solution for Interop.Vulkan
and it is an equally viable solution for other scenarios, then using that provides an overall consistent experience across both. It also ensures that the Interop.Windows
doesn't have to worry about breaking that "different" support via future changes (e.g. if it needs to start providing "default" resolution logic for one of the libraries it supports).
This could be made a lot simpler on the end user as well by simplifying the experience around setting and trimming AppContext
switches.
That is, the linker/trimmer/aot could instead make this as simple as just <RuntimeHostConfigurationOption Include="TerraFX.Interop.Windows.DisableResolveLibraryHook" Value="true" Trim="true" />
and then recognizing the basic pattern around AppContext.TryGetSwitch("constantstring", out var result)
. There shouldn't be an additional requirement to also define an explicit ILLink.Substitutions.xml
file be provided and it duplicate much of the relevant logic.
Provided that TryGetSwitch
handles constant strings and provided the version bubble is correct, it provides a nice, simple, and general purpose way for anyone to get this functionality.
If an AppContext switch is the only solution for Interop.Vulkan and it is an equally viable solution for other scenarios, then using that provides an overall consistent experience across both
That's not how we approach this sort of problems in dotnet/runtime and dotnet/aspnetcore libraries. We always prefer a solution without appcontext switch that just works out of the box in the common cases that we care about. We use appcontext switches only when the solution without appcontext switch is not viable. The solutions differ on library-by-library and case-by-case basis. We can certainly build more appcontext switches. We are intentionally not doing that, as appcontext switches provide worse experience. It also means that we are not very motivated to make implementation of appcontext switches simple.
You are certainly free to use different approach for your libraries.
I think the difference is that the BCL is often providing wildly different scenarios where there is not a close correlation to how something might be supported or desired in one vs another.
In my case, I'm providing interop binding libraries and while what they bind against might differ, the general look and feel as well as end user experience and layout of them is very similar/consistent.
So while I understand that something like Globalization
vs Json
in the BCL might have wildly different exposure mechanisms, the same doesn't necessarily hold true for two the interop libraries or two similar config switches within Json
itself or in Json
vs another serialization library.
I've put up https://github.com/terrafx/terrafx.interop.windows/pull/341 which enables the resolver to be trimmed via opt-in.
I also noted a few different edge cases in the trimmer:
static constructor
is left with an if (true) ;
block, so it isn't fully removed (this represents additional jit work and also prevents trimming things that are now fully dead)static constructor
doesn't work, it results in incorrect IL metadatastatic readonly T
(for primitive T
) to be a constant value, you cannot then access the field directly. Doing so leaves the initialization logic and other "dead code" still in the assembly. Accessing the field indirectly via a property does work as expected.I do not see the difference in what the BCL is providing vs. what your libraries are providing. The differences that I see is in principle approach:
I wouldn't say its sub-optimal.
The savings here for the TerraFX.Samples.DirectX
sample is about 1kb trimmed. It's about 4kb for the default PublishAot
experience (without disabling any runtime knobs). Neither of those are large and so its something that's being provided due to a "customer ask" since its an overall simple and reasonable ask.
It's about providing a consistent experience across the "terrafx ecosystem" (that is all libraries provided by https://github.com/terrafx/*
) and consistency is overall a more important feature in my eyes. Providing inconsistency across the ecosystem of libraries I'm exposing just leads to user pain, even for things like this that may be viewed as edge cases. It leads to needing to say "do this, except for when you're in this scenario, then you have to do this other thing instead".
With AppContext switches, it becomes a very simple/trivial of "add this one line to your csproj or dir.build.props":
<RuntimeHostConfigurationOption Include="TerraFX.Interop.Windows.DisableResolveLibraryHook" Value="true" Trim="true" />
This is simple for users, its easy to understand, and it works one way everywhere. There are no special annotations or considerations. Simply a "if you want this, do this" and leaves it as unimportant for the general case/scenario.
It also extends to other optional features such zero-cost + trimmable runtime validation, opt-in padding, or stat tracking in TerraFX.Interop.Mimalloc
. Which are much more broadly used/applicable
-- As an aside it also looks like initializing static readonly field
via the linker substitutions is currently trim incompatible. Doing <field name="s_disableResolveLibraryHook" value="true" />
works, but doing <field name="s_disableResolveLibraryHook" value="true" initialize="true" />
results in:
System.NotSupportedException: Specified method is not supported.
at ILCompiler.ProcessLinkerXmlBase.ProcessXml(Boolean) in /_/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/ProcessLinkerXmlBase
.cs:line 104
at ILCompiler.BodySubstitutionsParser.GetSubstitutions(TypeSystemContext, UnmanagedMemoryStream, ManifestResource, ModuleDesc, Strin
g, IReadOnlyDictionary`2) in /_/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/BodySubstitutionParser.cs:line 168
at ILCompiler.FeatureSwitchManager.AssemblyFeatureInfo..ctor(EcmaModule, IReadOnlyDictionary`2) in /_/src/coreclr/tools/aot/ILCompil
er.Compiler/Compiler/FeatureSwitchManager.cs:line 743
I's about 4kb for the default PublishAot experience (without disabling any runtime knobs)
Yes, the benefit is not large enough to talk about this to library user, ever. The library implementation should do it transparently by default, when possible, without requiring any user intervention. And when it is not possible, just accept the extra overhead.
It leads to needing to say "do this, except for when you're in this scenario, then you have to do this other thing instead"
Yes, you do not want to be explaining things like that.
This is simple for users, its easy to understand, and it works one way everywhere
Is it really easy to understand? People have to understand when it is safe to do this. It is safe to do in some cases, but it can going to break your app or one of the libraries used by your app in other cases. You are back to explaining complicated corner cases.
For reference, this is what the fix would look like when following the BCL principles: https://github.com/terrafx/terrafx.interop.windows/pull/342 . As added benefit, it also removes static constructor triggers.
Yes, the benefit is not large enough to talk about this to library user, ever. The library implementation should do it transparently by default, when possible, without requiring any user intervention. And when it is not possible, just accept the extra overhead.
To me this just ends up being "just accept the overhead, always" given how small it is. Because otherwise it will regress the scenario for users if I ever end up pulling in bindings for something that does need dynamic lookup. -- There are notably things that fall into this category today even for Windows, they just haven't been exposed "yet".
Is it really easy to understand? People have to understand when it is safe to do this. It is safe to do in some cases, but it can going to break your app or one of the libraries used by your app in other cases. You are back to explaining complicated corner cases.
I believe so, yes. It ends up not-trimmed by default, so users will not experience any break or extra considerations.
If the app needs to do some extra trimming and does not need the customized logic, then they opt-in via the one line. There is then a singular docs page that describes the available config switches. In the case of DisableResolveLibraryHook
it becomes a very simple explanation on "if you do not need customized library resolution logic, you can set this". The explanation of what "customized library resolution logic" does and the scenarios it may impact is simple and then consistent across all TerraFX.Interop.*
libraries. -- It's just a small explanation/example of handling cases like vulkan-1.dll
vs libvulkan.so.1
by default and that it also enables end users to provide their own loading rules on top.
As added benefit, it also removes static constructor triggers.
The static constructor not being removed with the other approach is a general trimmer issue that needs to be resolved. We have the same general issue for our own substitutions in corelib, including for simple things like if (IntPtr.Size == 8)
leaving behind an if (true) ;
or if (false) ;
block which is functionally "dead code".
The trimmer "needs" to work with code patterns and practices that average developers and real world apps write. It should not require them to contort their code to make the trimmer happy (at least not for relatively simple cases like these).
Because otherwise it will regress the scenario for users if I ever end up pulling in bindings for something that does need dynamic lookup.
Again, this is different approach for BCL vs. TerraFX. We regress size of existing scenarios all the time by adding new BCL features, it does not really bother us (unless it is very large). It does not stop us from optimizing the runtime and BCL for where things are today, even when some of these optimizations may be invalidated in future.
Right. We'll just have to leave it as difference of opinion.
I've at least provided a switch that works for power users, like Sergio, and which will equally work for the other TerraFX.Interop.*
libraries once I get corresponding PRs up. The default user experience will be that this isn't trimmed and where the 1-4kb will likely not matter to the typical user of these libraries.
I've correspondingly logged some bugs on dotnet/linker for the things which didn't work as expected and where various dead blocks were left in the trimmed output.
For example, TerraFX depends on reflection through this:
I think this one is self-inflicted. There's no point in calling NativeLibrary.Load from here:
it can just return IntPtr.Zero. There's a 200 kB difference in a hello world app that calls Load vs returns IntPtr.Zero.
Returning Zero will go into a lean code path because we know what assembly we're operating on when we call the callback. If someone goes and manually calls NativeLibrary.Load
, unfortunately we don't have a way to get rid of the custom attribute reading since the API design for that one fell into the "optional parameters that are expensive to calculate" trap. Such APIs should be separate overloads. Optional parameters are not trimmable.
There's a 200 kB difference in a hello world app that calls Load vs returns IntPtr.Zero.
Hmmm. I'm only seeing a 1kb difference for the default trimming of the TerraFX samples app (4kb for AOT).
I had also thought that returning non-zero was required, but that doesn't actually look to be the case, so maybe I'm misremembering. Will adjust since it looks to be unnecessary now anyways: https://github.com/terrafx/terrafx.interop.windows/pull/343
Hmmm. I'm only seeing a 1kb difference for the default trimming of the TerraFX samples app (4kb for AOT).
Is this for default settings or reflection disabled? I'm seeing the big difference for all left at defaults. Reflection disabled wouldn't make a big difference because custom attribute reading won't be present anyway (and if the nullable parameter was passed as null, it would likely throw).
This is just the defaults, not setting any special runtime feature switches or AOT knobs.
This is just the defaults, not setting any special runtime feature switches or AOT knobs.
On .NET 8?
I'm seeing:
$ dotnet new console -o foo
$ cd foo
Paste this into Program.cs:
using System;
using System.Reflection;
using System.Runtime.InteropServices;
class Program
{
static void Main()
{
NativeLibrary.SetDllImportResolver(Assembly.GetExecutingAssembly(), (s, a, p) =>
{
return NativeLibrary.Load(s, a, p);
//return IntPtr.Zero;
});
MessageBoxW(default, default, default, default);
[DllImport("user32")]
static extern int MessageBoxW(IntPtr a, IntPtr b, IntPtr c, int d);
}
}
$ dotnet publish -p:PublishAot=true
$ dir C:\foo\bin\Release\net8.0\win-x64\publish\foo.exe
02/27/2023 12:52 2,002,944 foo.exe
Then uncomment/comment the Load/Zero lines:
$ dotnet publish -p:PublishAot=true
$ dir C:\foo\bin\Release\net8.0\win-x64\publish\foo.exe
02/27/2023 12:52 1,797,120 foo.exe
You probably won't need the ILLink substitution stuff anymore.
I double checked with .NET 7 as well following up on what @KeterSCP said, and noticed I also had a slight size regression in .NET 8, though only in reflection free mode
Pretty sure it's because we now have a generic virtual method in the closure and that brings the type loader. Previously, reflection free mode didn't include the type loader - there's an extra "optimization" related to that here:
without this optimization the type loader would be present no matter what. We might want to delete the optimization - it's just an artifact of the past, when reflection disabled mode also broke the type loader (https://github.com/dotnet/corert/pull/7208#issuecomment-475986552).
What this optimization does is that it breaks Comparer.Default and EqualityComparer.Default. Nobody noticed that with reflection disabled because breaking code is par for the course when reflection is disabled.
On .NET 8?
No, for me its on .NET 7. I've not made the switch to start supporting .NET 8 yet. I typically do that around Preview 3.
Description
I've tried out .NET 8 Preview 1 for my NAOT sample for ComputeSharp (https://github.com/Sergio0694/ComputeSharp), and noticed that the size delta between reflection free and now is particularly large, more than expected (talking with @MichalStrehovsky he mentioned he would've expected the different to be 3-400 KB).
Note: I am already enabling all size saving options on top of reflection free, to make the comparison fair (source):
IlcOptimizationPreference
is set toSpeed
since the delta for that one is just ~50 KB, so not worth it in this scenario.FYI @MichalStrehovsky @agocke
Reproduction Steps
/samples/ComputeSharp.Sample.Cli/ComputeSharp.Sample.Cli.csproj
, change TFM tonet8.0
Expected behavior
The delta between reflection enabled and reflection free should be around smaller.
Actual behavior
I get these results:
That's a 929 KB difference (+49.1%).
Other information
Here's the two .map.xml files for both configurations: computesharp.cli.reflection.map.zip.