dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.13k stars 4.7k forks source link

[API Proposal]: What's the future of dynamic code execution, AOT, JIT and interpreter in .NET-based mobile apps? #101466

Closed alexyakunin closed 5 months ago

alexyakunin commented 5 months ago

Background and motivation

Currently there are following execution modes in MAUI:

  1. Full AOT
  2. Profile-guided AOT
  3. .NET Native, which seems to be a rewrite of Mono's Full AOT on .NET Core
  4. JIT - unavailable on some platforms, such as iOS
  5. Mono Interpreter.

Some of them mix together - e.g. JIT and interpreter can be used with any AOT mode. Each of these options has its pros and cons:

  1. Full AOT and .NET Native:

    • Require no recorded profile & technically no JIT or interpreter in runtime
    • But practically cannot be used without JIT or interpreter in even medium-sized apps, assuming these apps are using reflection at least to instantiate some types (think ILogger<T> scenario with IServiceProvider, which is quite common). There is no good way to automatically identify all possible generic instances in advance in any of such cases, so it's a choice between having full AOT + interpreter/JIT, or a full AOT which bans all dynamic invocations, including such things as invoking a constructor of a type, or creating a delegate for a specific generic method instance. In other words, it's a huge disadvantage, which kills a whole range of features we love .NET for. Lots of libraries (including Blazor) rely on reflection-generated delegate caching to speed up or even implement certain generic logic.
  2. Profile-guided AOT is actually the best option available now, and I'd say it's the only option our team would prefer to use - due to lack of any hard constraints. It allows you to balance between the speed & the size of your app, and it's fairly easy to record the profile. Once your app matures, you don't even have to update this profile with every release - most of mobile apps care mainly about the startup time, and the code you run on startup stays mostly the same from version to version.

IMO profile-guided AOT doesn't have any significant cons - except the fact its current implementation is definitely not perfect. E.g. we see a huge number of "AOT NOT FOUND" entries in Mono's debug output, and nearly all of them are also mentioned in AOT profile we use - in other words, maybe 50% of our startup code is still JITted. But I assume this can be addressed.

  1. JIT alone is definitely not a good option for mobile apps. Nearly any non-toy app would require some form of AOT for at least startup portion of its code.

  2. Interpreter, albeit being fairly slow, is still a good choice for many apps. Moreover, it's the only option you can use on iOS.

API Proposal

1. It's frustrating to see Microsoft invests a lot into .NET Native without clarifying what's the end goal there:

To clarify, all these items "go together", because they are absolutely needed to power the first one:

Dynamic code execution is one of features which makes .NET so attractive. Yes, it's mostly invisible for regular developers, but if you look at the library code... Just look at this list and ask yourself, which of the top ones don't heavily rely on it: https://www.nuget.org/packages

It also worth mentioning that Reflection is almost useless w/o dynamic code execution - again, ask yourself, what would you use it for, if you can't invoke whatever you inspect.

2. I don't see a single reason to prefer full AOT (think .NET Native) vs profile-guided AOT for nearly any non-toy mobile app or desktop app.

Yes, some .NET Native example are nice, but the amount of code there is tiny (compared to what you have in real apps). I understand there are some cases where this option seems to be preferrable - e.g. AWS lambda and Azure Functions scenarios, but even these are hard to justify for me (e.g. if such a function runs just for 1 minute & its small enough, AOT may save less than 0.1% of CPU cost vs a case when it's simply JITted).

As for iOS, not only our full AOT builds for iOS don't work without an interpreter, but their .ipa size is shockingly huge: ~ 200MB+ vs ~ 27MB for "no AOT, interpreter-only" mode. So right now we stick to the second option (interpreter on iOS seem to work much faster than on Android).

--

As you can see, it's not exactly an API proposal, but more an ask to clarify what's Microsoft stance on future of AOT, JIT, and dynamic code execution.

If you read everything until this point, you may also notice that:

API Usage

Since it's not about the API, there is no example.

Alternative Designs

A good alternative to profile-guided AOT is an ability to cache JIT output (in app data / app-specific files / right along the app's executable).

I think it might be a great fit for most of Android & UWP apps - i.e. it's typically fine to start slower, if it happens just once (or once per every version). And no profiling data is required in this case; as for AWS Lambdas / Azure Functions, all you need is to run the app once before the deployment - to produce its JIT cache.

On a downside, this option won't work on iOS.

Risks

No response

huoyaoyuan commented 5 months ago

The answers are not official.

3. .NET Native, which seems to be a rewrite of Mono's Full AOT on .NET Core

The term is confusable. For current .NET there are two AOT methods, one is Mono AOT, the other is CoreCLR NativeAOT which evolved from .NET Native for UWP. Mono AOT is used for mobile platforms. Support for CoreCLR NativeAOT on mobile platforms is experimental.

It's frustrating to see Microsoft invests a lot into .NET Native without clarifying what's the end goal there:

There are actually a lot of effort to make AOT compilation correct by default. The users get warned for anything that can't work automatically.

  • Will dynamic code execution be eventually available with .NET Native?

It can't be answered without asking "which AOT". For Mono AOT, the interpreter can already be enabled, which is the only solution for iOS-like platforms. For platforms that allows JIT, the preferred solution should be going with JIT path, and use ReadyToRun/Profile-guided AOT to optimize all static code.

It also worth mentioning that Reflection is almost useless w/o dynamic code execution - again, ask yourself, what would you use it for, if you can't invoke whatever you inspect.

Statically analyzable reflection is known by AOT toolchain and just works. It's useful for some cases like layering issue. Definitely less powerful though.

I don't see a single reason to prefer full AOT (think .NET Native) vs profile-guided AOT for nearly any non-toy mobile app or desktop app.

There're actually different levels to take the optimization. For example, you can enable trimming to remove unused code solely. The benefit for AOT can be in other areas:

It's also worth noting that maximum steady-state performance of AOT is worse than JIT. JIT can bake a lot of things into code, like memory addresses and one-time initialized configurations. But yes, partial-AOT with JIT enabled (ReadyToRun for coreclr) is the suggested and enabled for default for core libraries.

As for iOS, not only our full AOT builds for iOS don't work without an interpreter, but their .ipa size is shockingly huge: ~ 200MB+ vs ~ 27MB for "no AOT, interpreter-only" mode. So right now we stick to the second option (interpreter on iOS seem to work much faster than on Android).

This is somehow strange and doesn't meet the expectations. "AOT not found" should not happen for statically analyzable application.

A good alternative to profile-guided AOT is an ability to cache JIT output (in app data / app-specific files / right along the app's executable).

It already exists since .NET Core 3.0. We call it ReadyToRun. The pre-JITed code is put together into the assembly files. Similar NGen for .NET Framework exists for decades. It's done at running machine instead of developer's machine.

huoyaoyuan commented 5 months ago

I'm not sure with long-term goal of Mono AOT. We are mostly reusing the effort done in the past. Many of the efforts are for CoreCLR NativeAOT.

You can read documentation at https://learn.microsoft.com/en-us/dotnet/core/deploying/ready-to-run and https://learn.microsoft.com/en-us/dotnet/core/deploying/native-aot for desktop environment. You can also check the experimental NativeAOT for iOS at https://github.com/dotnet/runtime/tree/main/src/mono/sample/iOS-NativeAOT.

agocke commented 5 months ago

Thanks for the feedback!

As you can see, it's not exactly an API proposal, but more an ask to clarify what's Microsoft stance on future of AOT, JIT, and dynamic code execution.

I understand this request. Thus far we have not clearly delineated between what we see as "fundamental" limitations, and point-in-time limitations. The docs currently have a section called Limitations of Native AOT. We currently think of those limitations as being fundamental. It is unlikely that any of those restrictions will ever be lifted. I will try to make that clear in the documentation.

Regarding the spectrum of AOT options that we provide and long-term thinking, I would categorize the approach as "fast, small, and limited" versus "less fast, larger, and full-featured." "Fast" here is in reference to the speed in the most restrictive configuration, meaning an environment without JITing.

To detail each of the options you mentioned,

  1. Full AOT

Mono full AOT. Fast and limited.

  1. Profile-guided AOT

Full-featured, in-between performance. Ideally faster than the full-featured configuration, but dependent on the profile. Slow when the profile misses.

  1. Native AOT

CoreCLR full AOT. Fast and limited.

  1. Interpreter/JIT

An implementation of the full-featured runtime. Fast if JIT is allowed, slow if interpretation is required.

An important feature that was not mentioned in any of the above options is the level of trimming. Full AOT apps require trimming and it is the source of both a lot of the performance improvements, and most of the incompatibility. Code generation capability is not meaningful if the needed code was removed from the application, and achieving the highest performance and lowest disk size will require removing excess code.

To go into some more details

It also worth mentioning that Reflection is almost useless w/o dynamic code execution - again, ask yourself, what would you use it for, if you can't invoke whatever you inspect.

This is not true -- I've now annotated quite a lot of applications and many of them use reflection without dynamic code generation. Spectre.Console and Serilog, for example, had almost no dynamic code generation.

As for iOS, not only our full AOT builds for iOS don't work without an interpreter, but their .ipa size is shockingly huge: ~ 200MB+ vs ~ 27MB for "no AOT, interpreter-only" mode. So right now we stick to the second option (interpreter on iOS seem to work much faster than on Android).

We believe that Native AOT can get much smaller and faster. I wouldn't use current numbers as indicators of potential improvements. That said, we also believe that there is a tradeoff here and people will be on either side.

It feels Microsoft bets on "Full AOT + quite constrained .NET" option

We currently have a lot of deployment options and I think we will retain a large number of them, maybe swapping around implementations for performance. Over time we've been converging, rather than diverging, in implementation strategies. For example, Native AOT reuses large parts of the CoreCLR surface, including the JIT and GC, and builds on the same foundation as "crossgen."

I'm sure nearly any experienced .NET developer would rather prefer "Unconstrained .NET, but profile-guided AOT + JIT/interpreter" option, i.e. nearly what we have in MAUI apps right now. I don't have any stats backing this, of course.

I think this is mixed. Some people have provided feedback that they want a very small deployment size, while some people prioritize application startup on slow mobile devices. It seems likely that different people will prioritize different implementations. For rough metrics, I think you could easily see an order of magnitude different metrics for .NET form-factors on opposite ends of the spectrum -- so startup time with the limited form factor might be 10x better than the fullest-featured form factor.

alexyakunin commented 5 months ago

Hi, thanks for the responses. Some corrections:

CoreCLR NativeAOT

Yes, ".NET Native" = "CoreCLR NativeAOT" in my post. I actually tried to figure out what's the right term now, but failed :)

This is somehow strange and doesn't meet the expectations. "AOT not found" should not happen for statically analyzable application.

The detailed description of that issue explains it's not only about full AOT, but also about profile-guided AOT - i.e. there are missing methods in both cases, and moreover, some of them appear in stats from aprofutil - in other words, they are explicitly listed as methods to generate AOT code for.

Also, statically analyzable is a bit vague term: e.g. the template MAUI app mentioned in that issue is "statically analyzable" on paper, but I assume it's not true in practice, i.e. it still uses reflection internally, which renders full AOT code to be incomplete. And that's a story of a relatively small app with only .NET dependencies.

It already exists since .NET Core 3.0. We call it ReadyToRun.

Yes, I know about this. But it doesn't exist on MAUI for Android, for example, where it could be quite handy.

I've now annotated quite a lot of applications and many of them use reflection without dynamic code generation. Spectre.Console and Serilog, for example, had almost no dynamic code generation.

"Almost no" just supports my point here - i.e. they do use dynamic code generation. I'd say reflection w/o code execution (just to clarify, I assume property read/write is an example of such code execution, as well as other method calls) is mostly about attribute inspection. Which is handy, but way less handy than an ability to e.g. set property values, call methods and constructors.

I can name a decent number of scenarios where delegate caching helps to get rid of boxing. E.g. Blazor uses this technique to component implement property comparison & writes. In maybe 90% of cases you don't even need Reflection.Emit to get nearly the same efficiency with this.

--

Ok, I guess now it's time to rephrase my question a bit:

For the note, I care way less about Reflection.Emit & true IL codegen - this is nice and handy, but easy to workaround if you have at least MethodInfo.CreateDelegate, + AFAIK Expression evaluation is already supported there (which indicates that at least property/method invocation should be already supported there too).

alexyakunin commented 5 months ago

I think this is mixed. Some people have provided feedback that they want a very small deployment size, while some people prioritize application startup on slow mobile devices. It seems likely that different people will prioritize different implementations.

That's true as well. If we name two edge cases, they are:

And there is a lot of options in between - e.g. WASM scenarios, or a code running in some form of container on edge servers.

And it's also a lot about the app itself - I totally understand it's frustrating to see a few MB of artifacts produced for a tiny app. But my question goes more along this line: am I right that there is no long-term vision that renders profile-guided AOT & dynamic code execution obsolete? I'm asking it mostly because this is a huge downside for apps like the one we work on (likely, for any medium+ app, btw), and if that's the case, we'd certainly prefer to know this in advance.

alexyakunin commented 5 months ago

An important feature that was not mentioned in any of the above options is the level of trimming.

Yeah, I didn't mention trimming, but that's mostly because I assume it's a must-have pass :) + AFAIK right now it doesn't interact with AOT, i.e. the IL trimming happens before AOT codegen.

agocke commented 5 months ago

"Almost no" just supports my point here - i.e. they do use dynamic code generation. I'd say reflection w/o code execution (just to clarify, I assume property read/write is an example of such code execution, as well as other method calls) is mostly about attribute inspection. Which is handy, but way less handy than an ability to e.g. set property values, call methods and constructors.

Let me rephrase -- there was no dynamic code execution used in the core of those libraries. And no, property read/write is not an example of code execution. That is simply reflection. Code execution are things like Reflection.Emit, MakeGenericType or Delegate.CreateDelegate. All of those features take IL code and generate native code. Reading and writing properties does not require compiling IL at runtime.

We call it ReadyToRun. Yes, I know about this. But it doesn't exist on MAUI for Android, for example, where it could be quite handy.

Definitely possible that you could see this in the future.

Feature-rich apps on ~ mobile & desktop devices. The app size is kinda important here, but what truly matters is startup time + ongoing performance. And profile-guided AOT is probably number 1 choice here.

Unlikely. In our benchmarks with CoreCLR + R2R vs. Native AOT, we see very large wins in startup in full AOT that cannot be replicated by R2R. The problem is that the mere possibility of seeing non-AOTed code forces you to keep around a whole runtime that supports dynamic loading, and that incurs cost.

Moreover, if your app never actually uses the interpreter fallback, there isn't much point in having it in the first place. Most apps that require the interpreter will end up using it and that will impact their performance.

That said, some people value compatibility over performance, and that's fine! It seems likely that we will end up supporting configurations in all of these areas:

  1. Partial AOT, like R2R
  2. Full AOT
  3. Full interpreter (or whatever tech is used for compat)

So the short answer to your question is: no, we don't have plans to obsolete everything except full AOT. The most likely scenario is one where full AOT is just one of a few options, and we continue to improve those other options as well.

agocke commented 5 months ago

Yeah, I didn't mention trimming, but that's mostly because I assume it's a must-have pass :) + AFAIK right now it doesn't interact with AOT, i.e. the IL trimming happens before AOT codegen.

FYI, not true for Native AOT. In fact, one of the advantages of Native AOT is that it can trim more aggressively because it does trimming in conjunction with IL compilation.

dotnet-policy-service[bot] commented 5 months ago

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.

agocke commented 5 months ago

Closing question as answered.