NativeAOT is much slower that JIT version

Neo-vortex commented 2 years ago

A simple REST server as below, shows that NativeAOT is much slower than JIT version of the code.

Here is the the only controller in the app :

using Microsoft.AspNetCore.Mvc;
namespace nativeAOTapi.Controllers;

[Controller]
[Route("api/[controller]")]
public class TimeAPI :Controller
{
    [HttpGet]
    [Route("time")]
    public ActionResult<long> GetTime()
    {
        return DateTimeOffset.Now.ToUnixTimeMilliseconds();
    }
}

Here is Program.cs:

var builder = WebApplication.CreateBuilder(args);
builder.Services.AddControllers();
builder.Services.AddEndpointsApiExplorer();
builder.Services.AddSwaggerGen();
var app = builder.Build();
app.UseSwagger();
app.UseSwaggerUI();
app.MapControllers();
app.Run();

Here is Program.csproj

<Project Sdk="Microsoft.NET.Sdk.Web">
    <PropertyGroup>
        <TargetFramework>net6.0</TargetFramework>
        <Nullable>enable</Nullable>
        <ImplicitUsings>enable</ImplicitUsings>
        <IlcOptimizationPreference>Speed</IlcOptimizationPreference>
    </PropertyGroup>

    <ItemGroup>
        <PackageReference Include="Microsoft.DotNet.ILCompiler" Version="7.0.0-*" />
        <PackageReference Include="Swashbuckle.AspNetCore" Version="6.2.3" />
    </ItemGroup>
</Project>

Here is Nuget.config:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <packageSources>
    <!--To inherit the global NuGet package sources remove the <clear/> line below -->
    <clear />
      <add key="nuget" value="https://api.nuget.org/v3/index.json" />
    <add key="dotnet-experimental" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-experimental/nuget/v3/index.json" />
    <add key="nuget.org" value="https://api.nuget.org/v3/index.json" protocolVersion="3" />
  </packageSources>
</configuration>

JIT version runs on port 5247 NativeAOT version runs on port 5000

Here is some benchmark results;

for JIT:

Bombarding http://localhost:5247/api/TimeAPI/time for 10s using 200 connection(s)
[========================================================================================================================================================================================] 10s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec    175618.11   38073.63  219971.12
  Latency        1.14ms     2.07ms   192.43ms
  HTTP codes:
    1xx - 0, 2xx - 1749150, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    41.53MB/s

for NativeAOT

Bombarding http://localhost:5000/api/TimeAPI/time for 10s using 200 connection(s)
[========================================================================================================================================================================================] 10s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec     51621.50    7439.70   60899.66
  Latency        3.89ms     2.25ms   148.00ms
  HTTP codes:
    1xx - 0, 2xx - 514153, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    12.21MB/s

JIT version is build and run with dotnet run NativeAOT version is build with dotnet publish -r linux-x64 -c Release

dotnet : 6.0.101 OS : Linux pop-os 5.15.11-76051511-generic #202112220937~1640185481~21.10~b3a2c21 SMP Wed Dec 22 15:41:49 U x86_64 x86_64 x86_64 GNU/Linux CPU : Intel Core i7 8700

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

jkoritzinsky commented 2 years ago

This is possibly due to the code paths that ASP.NET Core's routing goes down on NativeAOT vs a JIT environment. In a JIT environment, ASP.NET Core uses runtime-compiled expression trees or JIT-compiled IL-emit, which can be quite fast. In an AOT environment, expression trees need to be interpreted and only traditional reflection can be used, not IL-emit, both of which are slower than their JIT equivalents.

cc: @davidfowl @dotnet/interop-contrib I believe this might relate to our earlier conversations about fast-reflection for ASP.NET Core.

ShreyasJejurkar commented 2 years ago

@jkoritzinsky Can we source generate those code paths at compile time itself? I wish in .NET 7, the frameworks like aspnetcore take full advantage of source generators to address these kinda issues. That way we can solve problems for both JIT and AOT worlds! Am yet to see any issue which tracks source generator work for aspnetcore, but I hope that is on priority!

davidfowl commented 2 years ago

@jkoritzinsky Yep we should put this under a microscope. I have some ideas where the time is spent but it would be good to get some confirmation.

EgorBo commented 2 years ago

I believe once we integrate NativeAOT into our PerfLab infra (TE benchmarks, etc) with ability to get native traces and validate changes by sending new bits we will find low-hanging fruits there

MichalStrehovsky commented 2 years ago

We're spending a lot of time in the expression interpreter. The calls are coming from HostFilteringMiddleware.Invoke and HttpProtocol.ProcessRequests:

publishaot1!System_Linq_Expressions_System_Linq_Expressions_Interpreter_LightLambda__Run
publishaot1!S_P_CoreLib_System_Func_4__InvokeObjectArrayThunk
  publishaot1!Microsoft_AspNetCore_HostFiltering_Microsoft_AspNetCore_HostFiltering_HostFilteringMiddleware__Invoke
  publishaot1!Microsoft_AspNetCore_Server_Kestrel_Core_Microsoft_AspNetCore_Server_Kestrel_Core_Internal_Http_HttpProtocol__ProcessRequests_d__223_1__MoveNext

LINQ expressions won't have good perf with native AOT because we can't JIT them. Not much we can do from the runtime side.

davidfowl commented 2 years ago

MVC controller actions, minimal APIs, SignalR hub methods all have runtime generated thunks that use expression trees to quickly invoke methods. It might be worth having an alternative reflection invoke mode on NativeAOT. Here's one of the shared components used to invoke some of these generated thunks:

https://github.com/dotnet/aspnetcore/blob/da6cdcbd5dc75b695cee36d47a22e1399cbea89e/src/Shared/ObjectMethodExecutor/ObjectMethodExecutor.cs

DI has a similar issue but we detect if dynamic code is supported and fallback to a reflection based strategy.

Maybe we can leave this open since we haven't invested in this as yet?

MichalStrehovsky commented 2 years ago

Sure we can keep this open. ~~But do you expect the fix to be in this repo?~~ EDIT: You just moved the issue as I was clicking comment

Looking at the ObjectMethodExecutor, the fix is basically (simplified):

    public object? Execute(object target, object?[]? parameters)
    {
        if (LINQ is compiled)
        {
            Debug.Assert(_executor != null, "Sync execution is not supported.");
            return _executor(target, parameters);
        }
        else
        {
            // New code
            return MethodInfo.Invoke(target, parameters);
        }
    }

davidfowl commented 2 years ago

@MichalStrehovsky yes, something like that. Or possibly not even using the ObjectMethodExecutor in some cases and using method info directly. I think it also make sense to track add NativeAOT variations to some of our benchmarks so we can observe these differences when we make the fixes.

ghost commented 2 years ago

Thanks for contacting us.

We're moving this issue to the .NET 7 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s). If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

steveharter commented 2 years ago

Do we have stats on the full stack? It would seem reflection \ interpreted IL would be a factor, but not the only one for a 3x perf degradation.

FWIW during 5.0 I researched Blazor client perf around System.Text.Json and it turned out that interpreted IL in Mono was a bit faster than the native Mono runtime for reflection although that was a point-in-time measurement. Like expressions, System.Text.Json has both an IL.Emit and standard reflection approach depending on environment capabilities.

MichalStrehovsky commented 2 years ago

This is running the System.Linq.Expressions interpreter. NativeAOT doesn't have an IL interpreter like Mono has.

The IL interpreter that Mono has is pretty efficient compared to what the Expression interpreter is doing. The Mono interpreter doesn't interpret the IL directly, but converts it to an IR beforehand. So e.g. if you read a field with IL, the Mono interpreter bytecode will probably already have field offsets encoded in the IR instruction stream. Whereas reading a field with the Linq.Expression interpreter will always call FieldInfo.GetValue because we don't have any more efficient reflection primitives that the expression interpreter could use.

We could make the Linq.Expression interpreter faster if we had better reflection primitives, but it will never be as fast as compiling the expressions into IL, and jitting them to native code (or jitting them to a more efficient IR that is runtime-specific).

davidfowl commented 2 years ago

Ohhh that sounds promising (more efficient expression tree interpreter 😄).

jkotas commented 2 years ago

If your app or library needs fast expression trees execution, NativeAOT is not a good fit for it.

We can work on incremental perf improvements in expression tree interpreter. I do not expect it will move the needle enough. Also, major new work in expression tree interpreter is at odds with its archived status: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Linq.Expressions/README.md.

davidfowl commented 2 years ago

The code generation options are all have different tradeoffs:

DynamicMethod/IL Emit - Requires JIT (can be interpreted on Mono by we don't on CoreCLR). Very efficient, harder to maintain.
Expression trees - Big and bloated, not being evolved, works everywhere (efficient with JIT, interpreted without).
Generics - Generic constraints are viral so highly polymorphic generic parameters still aren't type safe. Instantiations with value types need to be statically visible to the AOT compiler.
Source generators - Generated code versions with the application, no longer part of the framework. Can't change the call site so changes the programming model might be required.

Right now I'm thinking about a combination of generics and source generation to balance versioning (how much code exists in the app vs framework), but the generic constraints problem is a hard one to solve.

jkoritzinsky commented 2 years ago

@davidfowl I remember us chatting early in .NET 7 about some ideas on how to improve the reflection primitives to reduce the maintainability burden of using generics. Maybe we'll be able to explore that route more in .NET 8?

davidfowl commented 2 years ago

@davidfowl I remember us chatting early in .NET 7 about some ideas on how to improve the reflection primitives to reduce the maintainability burden of using generics. Maybe we'll be able to explore that route more in .NET 8?

Yes I'd love to finish exploring that route and seeing where it leads. Basically dynamic call site generation APIs in the runtime that don't use ref emit.

mkArtakMSFT commented 2 years ago

@davidfowl this looks like a meta-issue tracking work on multiple teams, is it? I'm moving it to .NET 8 Planning milestone and will let you handle how it should be tracked from there.

davidfowl commented 2 years ago

.NET 8 makes sense and we should leave it here

dotnet / aspnetcore

NativeAOT is much slower that JIT version #42221