dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.42k stars 4.76k forks source link

Don't use a GVM in Linq Select with NAOT by default #109978

Open keegan-caruso opened 2 days ago

keegan-caruso commented 2 days ago

Adds a feature flag to allow Linq Select to not use a GVM implementation.

The compiled size in Native AOT with a value type GVM scales at an order of n**2 relative to types used in the call. This avoids that growth in size at the cost of a slower implementation of chained Linq calls.

A real-world example of where this caused an inability to compile was in this issue: https://github.com/dotnet/runtime/issues/102131

Adapting this to something a bit contrived, but easy to measure:

GetEnumValue<GeneratedEnum0>();
…
GetEnumValue<GeneratedEnumN>();
static T? GetEnumValue<[DynamicallyAccessedMembers(DynamicallyAccessedMemberTypes.PublicFields)] T>() where T : struct, Enum
{
    var rawValue = "foo,bar";
    if (string.IsNullOrEmpty(rawValue)) return null;

    var type = typeof(T);
    if (type.GetCustomAttributes<FlagsAttribute>().Any())
    {
        return (T)(object)rawValue!
            .Split(',')
            .Select(x => Enum.TryParse<T>(x, true, out var result) ? result : (T?)null)
            .Where(x => !x.Equals(null))
            .Select(x => (int)(object)x!)
            .Sum();
    }
    else
        return Enum.TryParse<T>(rawValue, true, out var result) ? result : null;
}

class FlagsAttribute : Attribute { }
enum GeneratedEnum0 { };
…
enum GeneratedEnumN { };

The size of the Linq Namespace measured by sizoscope and when compiled with Native Aot:

N Size NET 9.0 Size with feature flag enabled
10 1016.6 kb 77.7 kb
25 5.2 mb 190.4 kb
50 19.7 mb 378.3 kb
75 43.4 mb 566.2 kb
100 76.3 mb 754.1 kb
1000 Failed to compile 7475.2 kb
stephentoub commented 2 days ago

A few questions:

  1. Who do we expect to set this to false?
  2. This is specific to Select because it's the only one of the virtuals on the base Iterator type that's generic? There have separately been concerns about the size impact of all of these specializations, GVM or not.
  3. We already have the OptimizeForSize build constant: https://github.com/dotnet/runtime/pull/109978/files#diff-bcb77d7db7721bb5508d93dc432d9a40d920e6b248c98c9ad14a3640bbe6fa2bR11. I'm not a huge fan of having yet another flavor. Could we get rid of that existing one, so that there's just one build of LINQ, and then use a switch like this to control things at publish / execution time?
am11 commented 2 days ago
The size of the Linq Namespace measured by sizoscope and when compiled with Native Aot: N Size .NET 9.0 Size with Feature Flag Enabled
10 1016.6 kb 77.7 kb
25 5.2 mb 190.4 kb
50 19.7 mb 378.3 kb
75 43.4 mb 566.2 kb
100 76.3 mb 754.1 kb
1000 Failed to compile 7475.2 kb

@MichalStrehovsky could ILCompiler optimize this pattern for value type a bit broadly (or even conservatively for .Select() verbatim) without introducing the feature switch?

keegan-caruso commented 2 days ago

A few questions:

  1. Who do we expect to set this to false?
  2. This is specific to Select because it's the only one of the virtuals on the base Iterator type that's generic? There have separately been concerns about the size impact of all of these specializations, GVM or not.
  3. We already have the OptimizeForSize build constant: https://github.com/dotnet/runtime/pull/109978/files#diff-bcb77d7db7721bb5508d93dc432d9a40d920e6b248c98c9ad14a3640bbe6fa2bR11. I'm not a huge fan of having yet another flavor. Could we get rid of that existing one, so that there's just one build of LINQ, and then use a switch like this to control things at publish / execution time?

1: It is an opt out of the feature if the performance is unacceptable and the developer is unwilling to rewrite their code to avoid Linq Select. If they are unwilling to make the size vs perf tradeoff for this scenario.

  1. Yes, only virtual that is also a generic method. I updated the description to add an example of the size difference we see with this change; it can be significant.

  2. I guess it is a question if we can conditionally set the value of the feature flag from TargetPlatformIdentifier and if it should be controllable at publish time.

stephentoub commented 2 days ago

I guess it is a question if we can conditionally set the value of the feature flag from TargetPlatformIdentifier and if it should be controllable at publish time.

I'm suggesting we wouldn't gate it on TPM, and instead guard everything relevant with the feature switch, which when set would cause lots of the specializations to be trimmed away.

MichalStrehovsky commented 2 days ago

I guess it is a question if we can conditionally set the value of the feature flag from TargetPlatformIdentifier and if it should be controllable at publish time.

I'm suggesting we wouldn't gate it on TPM, and instead guard everything relevant with the feature switch, which when set would cause lots of the specializations to be trimmed away.

I only very briefly spot checked what code is in the SpeedOpt files but if it's stuff like this:

https://github.com/dotnet/runtime/blob/d0b4ca6ee4b6f6c99e7aed2fe0e042e5e4974ad9/src/libraries/System.Linq/src/System/Linq/Range.SpeedOpt.cs#L10-L113

It would introduce untrimmable methods into compilation - these are all virtual/interface methods. Even if we stub them out, this feels like it's going to be a size regression since we need the method bodies, even if they pretty much do nothing.

MichalStrehovsky commented 2 days ago

@MichalStrehovsky could ILCompiler optimize this pattern for value type a bit broadly (or even conservatively for .Select() verbatim) without introducing the feature switch?

Do you have a more concrete idea? The change in this PR modifies behaviors, the resulting behavior is not identical; we take different codepaths and call different methods. It would require some very advanced analysis to do the equivalent change in the compiler.

MichalStrehovsky commented 2 days ago

Size statistics from rt-sz:

Nice savings for Avalonia. The rest probably already learned the lesson to just steer clear of LINQ if perf is of any concern.

Size statistics

Pull request dotnet/runtime#109978

Project Size before Size after Difference
avalonia.app-linux 22175368 20849944 -1325424
avalonia.app-windows 19117568 18252800 -864768
hello-linux 1352352 1352352 0
hello-minimal-linux 1081896 1081896 0
hello-minimal-windows 858112 858112 0
hello-windows 1103360 1103360 0
kestrel-minimal-linux 5474240 5474240 0
kestrel-minimal-windows 4908544 4909056 512
reflection-linux 2063440 2063440 0
reflection-windows 1750016 1750016 0
webapiaot-linux 10120480 10120480 0
webapiaot-windows 9157632 9158144 512
winrt-component-full-windows 5602304 5600768 -1536
winrt-component-minimal-windows 1747456 1747456 0
stephentoub commented 2 days ago

this feels like it's going to be a size regression since we need the method bodies, even if they pretty much do nothing.

It'd mean smaller code for nativeaot / trimmed coreclr apps in exchange for possibly slightly larger for mobile. Plus not having to maintain two biuild flavors of the library, and now another variation on top of it. I'm not excited at having both the build constant and the trimming constant, both to optimize for size, both with similar goals, but both doing it differently.

This enables it if PublishTrimmed is true, I wonder if it should set defaults based on PublishAot instead.

This is on by default if trimming is enabled?

I don't see it mentioned anywhere, but these changes still trade offf throughput and allocation for that size benefit. Some of the ones covered by this PR have been there since the earliest days of LINQ.

Also, this can have observable behavioral differences, which I thought we tried to avoid as part of trimming by default.

MichalStrehovsky commented 2 days ago

Also, this can have observable behavioral differences, which I thought we tried to avoid as part of trimming by default.

Is this behavior or just perf difference? I'm not thrilled about introducing perf differences either, but the generic expansion caused by this generic virtual method can lead to actual failure to compile (#102131 mentioned in the top post) because it becomes physically impossible to compile that much code.

Brainstorming alternatives, we could also add a perf analyzer that simply flags all uses of LINQ in code that sets IsAotCompatible as a perf problem so that people know to stay away from it. It's basically how we solved these issues in the past, but without the analyzer, just deleting LINQ use in e.g. ASP.NET.

MichalStrehovsky commented 2 days ago

Looking at the test results, looks like this change is not correct in this shape, the tests are hitting a stack overflow.

MichalStrehovsky commented 2 days ago

Brainstorming alternatives, we could also add a perf analyzer that simply flags all uses of LINQ in code that sets IsAotCompatible as a perf problem so that people know to stay away from it. It's basically how we solved these issues in the past, but without the analyzer, just deleting LINQ use in e.g. ASP.NET.

Maybe a perf analyzer wouldn't be the worst idea in general. LINQ expressions is another thing that performs very differently under AOT and we could use something that would steer people away from it better than a line in native AOT docs.

neon-sunset commented 1 day ago

Do you have a more concrete idea? The change in this PR modifies behaviors, the resulting behavior is not identical; we take different codepaths and call different methods. It would require some very advanced analysis to do the equivalent change in the compiler.

Maybe a perf analyzer wouldn't be the worst idea in general. LINQ expressions is another thing that performs very differently under AOT and we could use something that would steer people away from it better than a line in native AOT docs.

IlcFoldIdenticalMethodBodies=true appears to remove about 260 KB from LINQ namespace for N = 10.

I wanted to ask if something could be done to fold codegen-identical type instantiations besides removing optimized iterator implementations. This is not the first report here that concerns a problematic interaction of LINQ + enums with NAOT.

stephentoub commented 1 day ago

Is this behavior or just perf difference?

Behavior. It's generally minor, but for example if you have:

IList<T> list = ...;
foreach (var item in list.Skip(3).Take(4).Select(...) { ... }

that Select will end up producing an enumerable that will use the IList<T>'s indexer, but with this PR, it would end up enumerating it via GetEnumerator/MoveNext/Current.

MichalStrehovsky commented 1 day ago

IlcFoldIdenticalMethodBodies=true appears to remove about 260 KB from LINQ namespace for N = 10.

103951 would help then, but still would not solve #102131 because we cannot even represent that many methods within the compiler (and we need to represent and compile these methods before we find out they have identical method bodies).

agocke commented 13 hours ago

Behavior. It's generally minor, but for example if you have:

IList<T> list = ...;
foreach (var item in list.Skip(3).Take(4).Select(...) { ... }

I see that there's already an ifdef that controls some of the behavior here. It looks like that ifdef may already be enabled for some of the mobile platforms and WASM. Is this a known behavioral difference?

agocke commented 13 hours ago

Also, just to provide some clarity for @keegan-caruso, I think we should definitely have a runtime feature-flag for this behavior. Whether or not it's on by default when running AOT can be a separate decision, but there seems to be a wealth of evidence that this is a potentially blocking implementation for some people and we should provide a way to workaround the problem for large apps.