NativeAOT status for Android

vyacheslav-volkov commented 3 weeks ago

I had previously raised this topic in another issue https://github.com/dotnet/runtime/issues/101135, but I want to create a separate discussion as I couldn't find a place to track the progress on this matter.

The most serious and long-standing issues with Xamarin.Android is the slow startup time for applications. If you search the internet for "Xamarin.Android slow startup," you'll find hundreds of discussions on this topic. Even with all possible optimizations, including MonoAOT compilation, the startup time remains slow, and even MonoAOT works incorrectly on Android https://github.com/dotnet/runtime/issues/101135. This problem is particularly noticeable with UI frameworks such as Avalonia, UNO, and MAUI. Developers simply don't have the ability to solve this problem on their own, as it is rooted in the fundamental aspects of the platform's operation, and a significant amount of time is spent on JIT compilation. In the end, to write a "fast application" for Android that still lags behind native applications in terms of startup speed, you need to perform a whole range of additional operations, which not every developer can manage, just to make their application work somewhat faster. I believe that this expectation is where the main problem lies. A developer expects that the release build will immediately work as it should, but instead, they encounter performance issues where they don't expect them.

When .NET Native was introduced, I thought it would be the solution to the slow startup problem for Android. Starting with version .NET 8.0, it became stable for iOS, and I began actively using it. The results are impressive: a fairly large application on an iPhone X launches as quickly as any native application and even faster than a similar application on a Samsung Galaxy S22 Ultra, despite all possible optimizations for Android. The gap between the release of these devices is five years, and I dread to imagine the startup time on a five-year-old Android device. Yes, there are still limitations on using dynamic code, but they are not that difficult to overcome, resulting in an application that performs as fast as a native one. Isn't that what we want for a cross-platform application? Moreover, I’m almost 100% sure that no one uses Android applications without ProfiledAOT or FullAOT because, in that case, you can forget about startup performance. This also means they are already using trimming, so transitioning to NativeAOT wouldn't require much additional effort. Over time, more libraries and frameworks will become fully compatible with NativeAOT, making integration seamless for developers without any issues.

However, observing the discussions about .NET Native and the activity around this topic, I get the impression that the team does not give this problem enough priority, and no specific timelines have been set for its resolution. For example, in one of the discussions on GitHub, the following is mentioned:

These will likely work under Mono, but will need to be fixed one day in .NET 10 or some future release that supports NativeAOT. https://github.com/dotnet/android/issues/8724

This gives the impression that allocating resources for NativeAOT on Android is not a priority, and instead, new releases include optimizations that only provide marginal improvements (e.g., -10% startup time for test cases). However, in real-world conditions, such improvements do not solve the problem. If an application takes 2000ms to start, even reducing it to 1800ms makes little difference, and at best, such optimizations are noticeable only under ideal conditions.

It seems to me that the team does not fully grasp the depth of this issue. Many of my colleagues have already switched to Flutter specifically because of the slow startup times on Android. When their clients or customers ask why the Android application launches so slowly, developers are forced to reply that it is a limitation of the technology they are using, they may also suggest switching to iOS, where there are no such problems, but this is not an option.

In my opinion, the implementation of NativeAOT support for Android, should be considered critically important. I would like to hear the team's thoughts on this matter: what should we expect? Will NativeAOT support for Android be added in the near future, or should we only hope for small, incremental performance improvements that don't solve anything and are waiting for everyone to switch to Flutter?

jkotas commented 3 weeks ago

cc @jonathanpeppers @jonpryor

jonathanpeppers commented 3 weeks ago

Maybe just to breakdown the work involved slightly:

Runtime team:
- GC bridge (of some form) to support Java interop
- NativeAOT runtime packs for Android: this is somewhat working with linux-bionic-arm64 packages, but we probably want actual android-arm64, etc. packages.
Android team:
- Android workload (at build time), things like setting up the build for ILC, etc. We need to support multiple RIDs and join them into a single app during a build.
- Android workload (at runtime) uses Mono embedding APIs for startup, loading assemblies, etc. We'd probably throw out this code (at least partially) and rewrite for NativeAOT.
MAUI team:
- We'd probably want to run all their test suites on NativeAOT as well as Mono. They have some of this for iOS already.

We did some of the basic groundwork in .NET 9, such as:

Basic experiments with NativeAOT, testing Java interop can work.
Java.Interop and the Android .NET assemblies now have 0 trimmer warnings.

This seems like a multi-month effort involving multiple teams. I don't actually know when we'd start on this; as it's quite above my paygrade.

agocke commented 3 weeks ago

NativeAOT runtime packs for Android: this is somewhat working with linux-bionic-arm64 packages, but we probably want actual android-arm64, etc. packages.

I'm somewhat skeptical of this. We've increasingly stopped doing things more specific than kernel-libc-arch in the runtime. It seems unlikely that Android needs more than what's already in our bionic packages.

GC bridge (of some form) to support Java interop

Agreed that this is necessary, but somewhat ill-defined, I think. It's not clear what functionality is available in Mono that isn't available in Core CLR.

jkotas commented 3 weeks ago

It seems unlikely that Android needs more than what's already in our bionic packages.

There are number of special cases for Android in the higher-level runtime libraries. For example: https://github.com/dotnet/runtime/blob/477de3419157d809dc266ea03ff3fb4c05f3d1c1/src/libraries/System.Net.Http/src/System/Net/Http/HttpClientHandler.AnyMobile.InvokeNativeHandler.cs#L20-L22 .

These special-cases are unnecessary to get ordinary Linux-targeting code running on Android, but they are necessary for compatibility with Xamarin Android behaviors that exist today.

Agreed that this is necessary, but somewhat ill-defined, I think. It's not clear what functionality is available in Mono that isn't available in Core CLR.

Yes, the first step would be to extract the required functionality into an API proposal. The APIs that we have introduced for GC integration with ObjectiveC show the general shape to follow.

alexyakunin commented 3 weeks ago

So... The ETA is prob not .NET 10, right?

filipnavara commented 3 weeks ago

It seems unlikely that Android needs more than what's already in our bionic packages.

Aside from the things mentioned earlier, the whole Android crypto interop is currently not part of the linux-bionic packages.

Agreed that this is necessary, but somewhat ill-defined, I think. It's not clear what functionality is available in Mono that isn't available in Core CLR.

I had an idea to implement it in a way similar to Objective-C interop that I discussed informally with some of the stakeholders.

Here's the rough version copied from communication logs:

Assuming you are familiar with the MonoVM bridge, skip this part:

The Java bridged objects have a marker. At the end of GC you find all the marked objects that were collected and reconstruct an object graph of them. Then you switch the Java strong GC refs to weak GC refs, reconstruct the edges from the GC graph on the Java side (when possible, so only for certain bridged objects that have a List<object> on the Java peer side), and run Java GC. Once both the .NET and Java GC are finished, you switch the Java GC handles back to strong ones, and collect everything that didn't survive either GC.

The idea is to decompose the process into two phases and reuse the same logic that ObjC GC interop (reference counting) and COM interop does.

When a marked Java peer object is found unused by GC:

If you have strong Java GC handle, convert it to weak GC handle. Return "ref count" == 1.

If you have weak Java GC handle, return "ref count" == JavaGCHandle.IsAlive

If you have a WeakReference pointing to Java peer object:

If you access Target, convert the Java GC handle to strong one (if Target != null)

Interop that converts Java object to it's .NET Java peer object looks up the internal dictionary. If found and it has weak handle, convert to strong handle first

Down-side: You need to do .NET GC, Java GC, .NET GC to completely clean up peer objects, ie. one more GC on .NET side than MonoVM does... but MonoVM actually does part of it too, just in the hidden steps; Up-side: You don't block .NET GC on Java GC, the number of long-term surviving peer objects affects the GC much less

Notably, I had some feedback on it and there may be additional problems with the approach that I didn't originally foresee (https://github.com/dotnet/runtime/issues/104272#issuecomment-2239290778). We also didn't get anywhere near to implementing it, even as a rough prototype.

agocke commented 3 weeks ago

Aside from the things mentioned earlier, the whole Android crypto interop is currently not part of the linux-bionic packages.

I stand corrected. I find this factoring pretty unfortunate, though.

srxqds commented 3 weeks ago

Why doesn't Microsoft continue to invest more manpower in optimization on monovm?

huoyaoyuan commented 3 weeks ago

Why doesn't Microsoft continue to invest more manpower in optimization on monovm?

There can be non-trivially duplicated task for optimization - sometimes even totally rework from scratch to make sure the architecture is optimal. NativeAOT was built from scratch to make everything AOT friendly. RyuJIT was built from scratch to replace the old JIT which originates from MSVC. Being small doesn't mean friendly to optimization, and it's often the opposite due to lack of layering.

srxqds commented 3 weeks ago

Why doesn't Microsoft continue to invest more manpower in optimization on monovm?

There can be non-trivially duplicated task for optimization - sometimes even totally rework from scratch to make sure the architecture is optimal. NativeAOT was built from scratch to make everything AOT friendly. RyuJIT was built from scratch to replace the old JIT which originates from MSVC. Being small doesn't mean friendly to optimization, and it's often the opposite due to lack of layering.

Yes, you are right, but I hope development team can pay more attention to it, monovm feature and optimization always delay, even ignore, they always said it's not important. hope it can attach importance to align coreclr.

srxqds commented 3 weeks ago

I have opend so many issse https://github.com/dotnet/runtime/issues/created_by/srxqds ,most of them are igored.

GerardSmit commented 3 weeks ago

Additional information:

NativeAOT for Android was experimented here: https://github.com/dotnet/runtimelab/tree/feature/nativeaot-android And the write-up can be found here: https://github.com/dotnet/runtimelab/blob/feature/nativeaot-android/src/mono/sample/Android-NativeAOT/README.md

When you look at the section "Performance measurements", take it with a grain of salt. In Discord the following was mentioned when this document was released:

they measured the Debug version of NativeAOT .. They also didn't strip debugging symbols, so that doubly explains the size.

The size and performance of devices "Pixel 7a" and "Emulator" got updated after this message but I'm not sure about "Samsung Galaxy S10 Lite", Samsung "Galaxy S23" and "Pixel 5". These numbers didn't change after the initial commit (see Git Blame).

Why doesn't Microsoft continue to invest more manpower in optimization on monovm?

CoreCLR was made from the ground. Mono has to adapt to CoreCLR which can make it harder. For example, generics are currently a problem in Mono while more generics are being used in libraries:

Those methods are all generic methods which are not AOT'ed. Mono doesn't have tiered JIT. _https://github.com/dotnet/runtime/issues/104076#issuecomment-2278702305 in MonoAOT Perf_Single and Perf_Double Regressions on 6/3/2024 6:35:27 PM_

Mono has multiple backends; MonoJIT, MonoInterpreter, MonoLLVM (maybe I'm missing more) so implementing performance improvements is quite the task.

vyacheslav-volkov commented 3 weeks ago

@GerardSmit do you know what the Avalonia team used for this video https://www.reddit.com/r/dotnet/comments/13lvih2/nativeaot_ndk_vs_xamarinandroid_performance/? In their video the performance is as fast as what I get on iOS with NativeAOT.

jonathanpeppers commented 3 weeks ago

I also have a sample here, that should have been testing Release mode:

https://github.com/jonathanpeppers/Android-NativeAOT

GerardSmit commented 3 weeks ago

@vyacheslav-volkov I'm not sure what they used. I'm also not sure if they released any tools or source. In the Reddit comments they commented the following:

We may commercialise it, as a way to generate revenue to support our continued OSS work.

Which may be the reason they never open-sourced/released this experiment.

jonpryor commented 3 weeks ago

At the risk of completely sidetracking this discussion, @vyacheslav-volkov wrote:

The most serious and long-standing issues with Xamarin.Android is the slow startup time for applications.

There are many parts of the stack, and the .NET for Android part of the stack is not that slow. On a Pixel 6 Pro:

Java app (Android Studio > New Project > Empty Views Activity), built via gradlew assembleRelease:
```
I ActivityTaskManager: Displayed com.example.helloworld/.MainActivity for user 0: +141ms
```

dotnet new android for .NET 8, built via dotnet build -c Release:

I ActivityTaskManager: Displayed com.companyname.android_net8_hw/crc64b62cfbcfada02d88.MainActivity for user 0: +234ms

This is the first time I launched these apps (no averaging or anything), and .NET for Android is 93ms slower.

I don't consider that to be a lot of overhead.

The problems you're observing are not solely in MonoVM or JIT or runtime or .NET for Android (or everything built atop of them). I would not expect NativeAOT to be a "silver bullet" either.

steveisok commented 3 weeks ago

Aside from the things mentioned earlier, the whole Android crypto interop is currently not part of the linux-bionic packages.

I stand corrected. I find this factoring pretty unfortunate, though.

The reason for this is pretty clear cut. Most / all of the crypto API's are Java API's and since linux-bionic is not including any of that (analogous to targeting the NDK), there's not much we can do outside of re-opening the discussion of shipping openssl as part of the runtime.

vyacheslav-volkov commented 3 weeks ago

@jonpryor I agree that an empty application starts up fairly quickly. However, once you start adding code, the startup time begins to drop dramatically. At this point, the startup time doesn't depend on actual code optimizations anymore; it all comes down to the JIT compilation speed and efforts to reduce it.

For example, some advice suggests using only classes because it supposedly reduces compilation time https://github.com/dotnet/runtime/issues/101135#issuecomment-2138354357. But I can't say this is great advice, considering that everything in .NET is moving towards reducing allocations in the heap, and the framework itself is actively shifting towards using struct everywhere. Prohibiting the use of struct just for the sake of a faster startup sounds unreasonable.

Or take a simple MAUI application: its startup time will be around 600 ms. This means that an empty application already starts 2-3 times slower than a native application. As the developer adds their code, the startup time ranges from 1500 to 5000+ ms. In this case, traditional code optimization doesn't work — the developer must understand that optimization here is about easing the JIT compilation process rather than improving the code itself.

Here's a real example: my framework doesn't use any complex features, but it has a lot of struct and generic. The actual startup time with FullAOT on Android is about 1000 ms on a Galaxy Note 10. The same code on Xamarin.iOS with NativeAOT starts instantly on an iPhone X.

Here's the link to the repository https://github.com/vyacheslav-volkov/PerfAndroidTest/tree/main, where you can find two projects — Android (FullAOT) and iOS (NativeAOT). I added .speedscope.json files for the Android project to the trace folder. If you have time, please take a look and give me some advice on how I can improve the startup time without changing the runtime or using NativeAOT for Android. Also, if you check this issue https://github.com/dotnet/runtime/issues/101135 you will find that in the current state of Xamarin.Android, even when using FullAOT, it is not possible to AOT generics and structs.

jonpryor commented 3 weeks ago

Here is another sample which uses NativeAOT on Android, and unlike @jonathanpeppers sample has the benefit of looking like .NET for Android, with a C# subclass of Android.App.Activity: https://github.com/dotnet/java-interop/tree/main/samples/Hello-NativeAOTFromAndroid

For comparison to the previous Android times:

I ActivityTaskManager: Displayed net.dot.jni.helloandroid/my.MainActivity for user 0: +300ms

Compare 300ms to Java (+141ms) and .NET for Android (+234ms). This also contains additional debug prints, so isn't directly comparable, but should further emphasize that NativeAOT in and of itself will not be a silver bullet to all of your startup woes. A lot depends upon code higher up the stack.

At present, one of the primary blockers keeping us from dedicating more effort to NativeAOT support within .NET for Android is the lack of a decent GC story. (The current story is "everything leaks, lol".)

I do not foresee dedicating significant effort to support NativeAOT on the .NET for Android side until after the GC story is complete, and I'd further guesstimate that we'd want at least one .NET release after the GC exists before we'd support it.

@alexyakunin wrote:

So... The ETA is prob not .NET 10, right?

If NativeAOT has a GC story for .NET 10, I'd tentatively hope for preview support in .NET for Android by .NET 11. Maybe. (There are a number of unknown unknowns, and would not want to get anyone's hopes up.) Increase numbers as appropriate.

agocke commented 3 weeks ago

A lot depends upon code higher up the stack.

This sounds right to me. For reference, the console app startup for native aot on a Linux desktop machine is measured in microseconds so the runtime overhead in native aot is ~0. All of the startup impact is the cost of the code running in the startup path.

vyacheslav-volkov commented 3 weeks ago

@jonpryor I just conducted a quick and rough test, but I think the point will be clear. I measured the initialization time of services in my test application twice, meaning the same code was executed twice. The first time it needed time for JIT compilation, and the second time it ran without JIT compilation. The second run was 27 times faster. As a developer, there’s nothing I can do to affect JIT compilation, and traditional code optimizations won’t work. I would need to rewrite the entire codebase just to make it easier for Android’s JIT compiler to handle it. And this is just a small portion of the code needed to launch the application. In this code, nothing is being called other than the creation and registration of services.

        var stopwatch = Stopwatch.StartNew();
        MugenApplicationConfiguration.Configure()
                                     .AndroidConfigurationGeneratedBindings<MainViewModel, MainActivity>(true, null, this)
                                     .PerfAndroidGeneratedBindingConfiguration()
                                     .CompositeUIConfiguration(new ShellHandlerProvider())
                                     .WithComponent(new MainSectionManager());
        stopwatch.Stop();
        Log.Wtf("STARTUP1", stopwatch.Elapsed.ToString());

        stopwatch.Restart();
        MugenApplicationConfiguration.Configure()
                                     .AndroidConfigurationGeneratedBindings<MainViewModel, MainActivity>(true, null, this)
                                     .PerfAndroidGeneratedBindingConfiguration()
                                     .CompositeUIConfiguration(new ShellHandlerProvider())
                                     .WithComponent(new MainSectionManager());
        stopwatch.Stop();
        Log.Wtf("STARTUP2", stopwatch.Elapsed.ToString());

Phone: Galaxy Note 10, Release build + FullAOT
STARTUP1    00:00:00.1165980
STARTUP2    00:00:00.0043421

If NativeAOT can make any user code executes in 50-100 ms (based on this example, this is more than enough if we don't need JIT compilation), plus an additional runtime execution time of 250-300 ms, we would achieve a total startup time of 350-400 ms for any application. This is comparable to the startup time of native applications.

jonathanpeppers commented 3 weeks ago

As a developer, there’s nothing I can do to affect JIT compilation

@vyacheslav-volkov for your example above, have you tried either to "AOT Everything" with -p:AndroidEnableProfiledAot=false, or recorded a custom AOT profile?

https://github.com/jonathanpeppers/Mono.Profiler.Android

By default, we use a built-in AOT profile that won't include most of your code. It is a reasonable tradeoff for app size vs startup time.

If you can use AOT for the code above, STARTUP1 should be much quicker.

(note that this is using Mono's AOT in the current product, and completely unrelated from NativeAOT).

vyacheslav-volkov commented 3 weeks ago

@jonathanpeppers I've used this test project with FullAOT (Mono's AOT), you can check the config, it should be good: https://github.com/vyacheslav-volkov/PerfAndroidTest/blob/main/PerfAndroid/PerfAndroid.csproj#L12-L14

Out of curiosity, I ran the same code without AOT, and here’s the result:

STARTUP1  00:00:00.2220397
STARTUP2  00:00:00.0038740

jonathanpeppers commented 3 weeks ago

@vyacheslav-volkov can you check it's actually using AOT? It seems odd AOT would make the first run worse than JIT.

adb shell setprop debug.mono.log default,mono_log_level=debug,mono_log_mask=aot

This should make Mono print out a log message for each method like:

10401 10401 D Mono    : AOT: FOUND method Microsoft.AspNetCore.Components.WebView.Maui.BlazorWebView:.ctor () [0x6f9efd0150 - 0x6f9efd0340 0x6f9efd260c]

Note it's expected some methods will say:

10401 10401 D Mono    : AOT NOT FOUND: (wrapper runtime-invoke) object:runtime_invoke_void (object,intptr,intptr,intptr).
10401 10401 D Mono    : AOT NOT FOUND: (wrapper managed-to-native) System.Diagnostics.Debugger:IsAttached_internal ().
10401 10401 D Mono    : AOT NOT FOUND: (wrapper native-to-managed) Android.Runtime.JNINativeWrapper:Wrap_JniMarshal_PPL_V (intptr,intptr,intptr).

Clear debug.mono.log later after testing (as it will slowdown apps). You can reboot the device or use adb shell setprop debug.mono.log "''"

charlesroddie commented 3 weeks ago

My observations and suggestion as an end user:

Android is 2+ years behind other targets for AOT compilation in dotnet.

Windows: 2015 with UWP/netnative, 2024 with WinUI/NativeAOT.
IOS, Mac: 2023 NativeAOT, with full MonoAOT for many years before.
Web: probably NativeAOT in 2024: Avalonia is very close to running on NativeAOT WASM https://github.com/AvaloniaUI/Avalonia/issues/16211 .
Android: not even full mono AOT. NativeAOT in 2026 at earliest according to comments above.

This stems from using the Android SDK which relies on Java interop, which makes things much more complicated than any other platform. The possible plans described above (https://github.com/dotnet/runtime/issues/106748#issuecomment-2302160443, https://github.com/dotnet/runtime/issues/106748#issuecomment-2305016834) suggest tackling these issues which may take a long time.

Surely the NDK is a better target

Flutter uses the NDK and fully AOT-compiles everything and you can access relevant android-specific stuff from dart. This is where dotnet should be:

https://docs.flutter.dev/resources/faq#run-android The engine's C and C++ code are compiled with Android's NDK. The Dart code (both the SDK's and yours) are ahead-of-time (AOT) compiled into native, ARM, and x86-64 libraries. Those libraries are included in a "runner" Android project, and the whole thing is built into an .apk. When launched, the app loads the Flutter library. Any rendering, input, or event handling, and so on, is delegated to the compiled Flutter and app code.

In dotnet, there are some POCs as mentioned above (https://www.reddit.com/r/dotnet/comments/13lvih2/nativeaot_ndk_vs_xamarinandroid_performance/ and more recently https://github.com/jonathanpeppers/Android-NativeAOT ), both using SkiaSharp. But we would need the NDK callable from dotnet, similar to calling native code from dotnet in SkiaSharp, dotnet-ios, WinUI, etc..

vyacheslav-volkov commented 3 weeks ago

@jonathanpeppers I'll check, but it hasn't gotten worse. With AOT, the time is 00:00:00.1165980, and without AOT, it's 00:00:00.2220397. The code with AOT runs twice as fast, but it's still not fast enough.

vyacheslav-volkov commented 3 weeks ago

@jonathanpeppers @jonpryor I encountered 1,700 AOT NOT FOUND errors https://github.com/vyacheslav-volkov/PerfAndroidTest/blob/main/trace/aot_logs.txt, all related to the use of generics and value types, which matches the description of a known issue https://github.com/dotnet/runtime/issues/101135. I believe this is a serious problem that cannot be ignored. If we don't get NativeAOT until 2026, we need to find other ways to reduce startup time. Currently, all documentation suggests using ProfiledAOT or FullAOT, but as we can see, this does not solve any problems when using structs with generics, making it ineffective.

I created an empty project that uses only one framework and nothing else (https://github.com/vyacheslav-volkov/PerfAndroidTest/tree/main), and it already takes 1000 ms to launch, even after using all available optimization methods. I have demonstrated that all the time is spent on JIT compilation rather than executing actual code, as the code itself runs very quickly compared to the JIT compilation https://github.com/dotnet/runtime/issues/106748#issuecomment-2305224622. This is a real problem, and currently, there is no solution for it. This issue has been ignored for a long time, causing many developers to stop using Xamarin.Android. Claims that an empty app can launch in 250 ms are not helpful when even a test app takes 1000 ms to start and does nothing but initialize.

If you have any real ways to solve this problem right now, please let me know, and I will close this issue, and we will update the documentation so that all developers are aware. But if there is no solution, we need to find one and reduce the startup time. So far I see only two options: using NativeAOT or fixing the existing bug https://github.com/dotnet/runtime/issues/101135.

alexyakunin commented 3 weeks ago

@jonpryor @jonathanpeppers

If NativeAOT has a GC story for .NET 10, I'd tentatively hope for preview support in .NET for Android by .NET 11. Maybe. (There are a number of unknown unknowns, and would not want to get anyone's hopes up.) Increase numbers as appropriate.

If that's the case, IMO it makes sense to address the #101135 first - at least to an extent we can generate most of AOT code with some workarounds. E.g. we already tried to use https://github.com/gluck/il-repack to merge the code from our own assemblies - the merge works for WASM, but leads to errors when you build Android target (that's in our specific case). I'll try to drag it to some final conclusion over the next few weeks, so expect some updates here.

As for the fix, it seems it's all about relaxing the filtering conditions here: https://github.com/fanyang-mono/runtime/blob/main/src/mono/mono/mini/aot-compiler.c#L13918 - IDK if there are any other remaining low hanging fruits, but overall, we desperately need a way to generate at least the generic instances parameterized with our own value types - this would close AsyncTaskMethodBuilder & ConcurrentDictionary scenarios.

Just imagine how painful it is to know there is nothing you can do to improve startup time further - while knowing that fixing this single problem would make it 4x faster (I explained in #101135 that 75+% of time is spent in JIT in our case).

And I really don't understand why MS doesn't take it very seriously. Slow startup = elevated ANR rate no matter what you do. There are slow devices. As a result, you'll be penalized on Google Play just because of this:

And literally all of ANRs in our case point to exactly this ANR reason:

Native method - android.os.MessageQueue.nativePollOnce
Input dispatching timed out (No focused window)

Now imagine you're a PM deciding whether to use MAUI or not for the next project. And you find out that no matter whether your app is supposed to start fast or not, it will be penalized on Google Play for elevated ANR - all because 1% of devices where your startup time will be >5s. And you have no way to address this. Would you bother to explore the MAUI scenario further? Seriously?

emmauss commented 2 weeks ago

@GerardSmit do you know what the Avalonia team used for this video https://www.reddit.com/r/dotnet/comments/13lvih2/nativeaot_ndk_vs_xamarinandroid_performance/? In their video the performance is as fast as what I get on iOS with NativeAOT.

I'm one of the devs in avalonia who directly worked on this port. NativeAot pretty much brought our UI framework as close to native as possible. Most of our framework code is managed, and only skia, which is our main renderer, relies on native api. Both MonoJit and MonoAot have virtually the same slow performance at start up and loading new views in-app, as seen in that simple app shown in the video. We have a much more complex app, the controls catalog, that is near instant to boot with NativeAot, but takes 2-6s depending on device for Xamarin.Android. We had 2 experiments. One completely forgoing .Net Android and going full native with ndk's NativeActivity, which you see in the reddit thread, and another using .Net Android's AppCompat to create the activity, demonstrated here with our control catalog on a real device.

https://github.com/user-attachments/assets/4a4fade8-b7ce-453b-b2ae-4ccd54f882a9

alexyakunin commented 2 weeks ago

@emmauss , a quick q -

Both MonoJit and MonoAot have virtually the same slow performance at start up and loading new views in-app,

How far did you guys investigate the reasons of slow startup - in particular, did you try to count the number of methods for which AOT code wasn't generated (AOT_NOT_FOUND in mono debug log), or something like % of time spent in JIT on startup?

We reported #101135 while investigating slow startup in our app - in short, Mono AOT doesn't generate code for any generic method instance parameterized with non-primitive value type. So if you heavily rely on structs & generics (which is reasonable on CoreCLR), or have a decent amount of async code (it relies on AsyncTaskMethodBuilder<T> & state struct), JIT will dominate. In our case 75% of startup time is spent in JIT.

I don't know much about internals of Avalonia, but you can probably instantly tell if the demo app heavily uses something like Wrapper<T>, Adapter<T>, or Serialize<T>(...) on startup, where T is a struct. If yes, this could be the reason of a slow startup in this case as well.

P.S. Kudos for the demo / video. We need some hope ;)

emmauss commented 2 weeks ago

@emmauss , a quick q -

Both MonoJit and MonoAot have virtually the same slow performance at start up and loading new views in-app,

How far did you guys investigate the reasons of slow startup - in particular, did you try to count the number of methods for which AOT code wasn't generated (AOT_NOT_FOUND in mono debug log), or something like % of time spent in JIT on startup?

We reported #101135 while investigating slow startup in our app - in short, Mono AOT doesn't generate code for any generic method instance parameterized with non-primitive value type. So if you heavily rely on structs & generics (which is reasonable on CoreCLR), or have a decent amount of async code (it relies on AsyncTaskMethodBuilder<T> & state struct), JIT will dominate. In our case 75% of startup time is spent in JIT.

I don't know much about internals of Avalonia, but you can probably instantly tell if the demo app heavily uses something like Wrapper<T>, Adapter<T>, or Serialize<T>(...) on startup, where T is a struct. If yes, this could be the reason of a slow startup in this case as well.

P.S. Kudos for the demo / video. We need some hope ;)

We use a lot of generics. Our Property system is completely backed by generics, and every object in the UI is an AvaloniaObject. So, with how our Property must be registered on a type deriving from AvaloniaObject, mono is probably not generating aot code for those. Our styling, visual, and input system rely on it. We do use structs for value types(size, location, etc).

alexyakunin commented 2 weeks ago

Just read: https://docs.avaloniaui.net/docs/basics/user-interface/controls/creating-controls/defining-properties

AvaloniaProperty.Register<MyCustomButton, int>(...)

Almost crying... As far as I can reason, that's nearly the worst case scenario, assuming it's not int, but any type declared outside of mscorlib. It's going to skip AOT for all <AnyClass, struct> params just because there is a struct.

There must be > 1K properties in such a demo. Assuming Register is not the only generic methods called at least once per property registration + this happens for every property on every component type, it might be easily a few thousands of generic method instances with missing AOT.

In our case we see ~4K of AOT_NOT_FOUND methods in Mono debug log, and JIT alone eats up ~ 1.5s of time on Galaxy S23 Ultra.

One other notable case is caching - if you use ConcurrentDictionary.GetOrAdd<TKey, TValue, TState>() overload with one of args being value type, all of such calls will also require JIT.

NativeAOT handles all these cases, and this might at least partially explain such a dramatic difference.

alexyakunin commented 2 weeks ago

@emmauss If you guys can try running the demo with Mono debug log enabled & share the number of AOT_NOT_FOUND methods @ startup, it would be great... Maybe it will help MS folks to seriously think about the priority of this issue. Especially in the light of "you'll be lucky to have .NET Native for Android in .NET 11".

adb shell setprop debug.mono.log default,assembly,mono_log_level=debug,mono_log_mask=all

agocke commented 2 weeks ago

One note: a lot of the above costing implicitly assumes MAUI is on top, meaning that the system needs tight JVM integration. I don't know if platforms like Avalonia actually require that. If not, and they can compile against the Android NDK, the cost and schedules may change. I'll let someone from Avalonia speak on how they used Native AOT in the past.

emmauss commented 2 weeks ago

One note: a lot of the above costing implicitly assumes MAUI is on top, meaning that the system needs tight JVM integration. I don't know if platforms like Avalonia actually require that. If not, and they can compile against the Android NDK, the cost and schedules may change. I'll let someone from Avalonia speak on how they used Native AOT in the past.

We tested Native AOT with the Android NDK, using NativeActivity for our activity. This cut support for SDK apis that modern android apps use, like the storage access framework, window insets, text and input composition and embedding native android views in-app. Apis we need for storage, window customization and text prediction support. Also, we couldn't use any dotnet android libraries. These do not make it appealing to end users as they will be cut off from the rich dotnet android library ecosystem, and also need to set up a lot of build scripts just to build and sign their app.

jonathanpeppers commented 2 weeks ago

One note: a lot of the above costing implicitly assumes MAUI is on top, meaning that the system needs tight JVM integration. I don't know if platforms like Avalonia actually require that. If not, and they can compile against the Android NDK, the cost and schedules may change. I'll let someone from Avalonia speak on how they used Native AOT in the past.

Generally, I don't know how you would make a "real" Android application that without calling Java APIs. Even Unity3d games would use their Java interop support for things like in-app purchases, push notifications, etc. There are a lot of random OS features you have to access from Java, so I would think most Avalonia apps would also need to use these.

agocke commented 2 weeks ago

Generally, I don't know how you would make a "real" Android application that without calling Java APIs.

I believe you could still call Java APIs through JNI, it would just be significantly more effort than the current implementations.

do not make it appealing to end users as they will be cut off from the rich dotnet android library ecosystem, and also need to set up a lot of build scripts just to build and sign their app

Agreed, the downside of this approach would be none of the existing Android/Java interop would work.

vyacheslav-volkov commented 2 weeks ago

@agocke If someone could fix this issue https://github.com/dotnet/runtime/issues/101135, we could use it as a workaround until full support for NativeAOT is available. Could someone from the team assess how difficult this task might be and estimate how long it might take to fix? Currently, we do not have a truly working solution to the slow startup problem.

agocke commented 2 weeks ago

That issue has 86 comments, so let me see if I can summarize. That's not one issue but really a blanket issue for: we've seen a variety of methods that must be JITed in our sample apps, which causes slow startup. Is that right? If so, I would expect that issue to verge on impossible to fix. Rearchitecting Mono to AOT everything is more expensive than just using Native AOT.

alexyakunin commented 1 week ago

That issue has 86 comments, so let me see if I can summarize. That's not one issue but really a blanket issue for: we've seen a variety of methods that must be JITed in our sample apps, which causes slow startup. Is that right? If so, I would expect that issue to verge on impossible to fix. Rearchitecting Mono to AOT everything is more expensive than just using Native AOT.

No, it's not quite right: there is a very specific scenario where AOT code isn't generated for a generic method instance:

It's an instance with a ValueType parameter
Which isn't a primitive type - this was relaxed to a type from mscorlib in a partial fix
And if I am not mistaken, the method itself has to be declared in the same assembly as its parameter.

I'll find the link to the specific piece of code making all these checks a bit later (already shared it here).

Overall, my impression is: yes, probably some extra is necessary to eliminate some of these constraints (e.g. modifying AOT code lookup logic, etc.), but this isn't as complicated as a full overhaul of Mono AOT.

Moreover, I suspect some of these constraints were originally added to decrease the number of generic implementations AOT generates in Full mode, and it happened at the time when generics weren't so widespread + there was no profile-based AOT mode.

alexyakunin commented 1 week ago

That issue has 86 comments

I also wish people responding to it take it seriously right from the beginning instead of saying ~ "well, you guys are fine - the app is at least starting, right? as for the startup time, it's sad, but please wait for a few more years."

I didn't write about this bizarre issue with AOT because I genuinely love .NET and believe you guys are doing a great job making it better. So even though this discovery means Mono AOT is 90% fake, and profile-guided AOT deserves this name only formally, I'd rather wait for MS to address it.

And somehow... Somehow I discover that "it's fine" feels like an acceptable answer for MS here. But seriously, how it can be fine, if a single post about this would decrease a chance of MAUI being used by a given company by maybe 50%? Isn't an elevated ANR & Play Store penalization one of the worst things you can face, assuming you can't fix this?

alexyakunin commented 1 week ago

My point is: if you guys would run MAUI as a startup, this issue would be instantly classified as "existential":

If people start mentioning it as number 1 concern, all of our other efforts to promote MAUI are doomed.
Thus no matter how much we invest into other features of MAUI or Blazor Hybrid, this single thing deserves more attention.

agocke commented 1 week ago

No, it's not quite right: there is a very specific scenario where AOT code isn't generated for a generic method instance:

It's an instance with a ValueType parameter Which isn't a primitive type - this was relaxed to a type from mscorlib in a partial fix And if I am not mistaken, the method itself has to be declared in the same assembly as its parameter.

Assuming the above are true, my understanding is that this is not quite as expensive as guaranteeing Mono can AOT any code, but it's close. My understanding is that specialization of value types is one of the main limitations in the current architecture and implementing it would be a very large work item.

It's certainly possible that there is a simpler implementation I'm not aware of, but I would start the cost as very expensive.

charlesroddie commented 1 week ago

If a mono architecture limitation prevents full AOT on Android, why does full mono AOT work on IOS?

vyacheslav-volkov commented 1 week ago

The problem is that there is a huge gap emerging between iOS and Android in terms of performance for .NET applications. With each new release, the .NET team adds more and more value types (ValueTuple, ValueTask, Span, Memory, etc.), aiming to reduce heap allocations by using more value types. Developers naturally follow this trend, and various libraries increasingly use value types. However, when trying to create an application on Android, they encounter very slow startup times, whereas other platforms do not face this issue.

When developers ask why it is so slow on Android, they are told that using value types is detrimental because it makes JIT compilation harder, and they should avoid using them. But developers who have already spent a lot of time writing their code with value types are unlikely to create a special version just for Android that avoids them. This situation reveals a contradiction: the entire .NET ecosystem is moving towards optimizations by reducing allocations, but if you want to write for Android, you are advised to forget about value types and generics and use only classes.

I am currently working on a large project, and for iOS, I am using .NET Native. My library heavily utilizes generics and value types, and I see no issues with this. The installation size from the App Store is 72MB, which is smaller than many similar native applications written in Objective-C/Swift (~100MB), and the performance is comparable to these native apps. The situation is entirely different with Android. As soon as I add simple initialization that does nothing but call managed code, I experience a significant performance hit. For instance, JIT compilation slows down initialization by a factor of 27 check my comment https://github.com/dotnet/runtime/issues/106748#issuecomment-2305224622. I've also shared a repository where you can check it out https://github.com/vyacheslav-volkov/PerfAndroidTest. I have used all possible optimization methods, including FullAOT for Android, but this only slightly improves the result. The only option I see is to rewrite all the code specifically for Android, but even that may not help, as .NET itself uses many generics.

I don’t understand the .NET team's stance, which denies this issue by citing empty applications where everything works fine. I have provided my examples and asked for optimization assistance (@jonathanpeppers @jonpryor) but have not received any real advice that could help address the situation. The most frustrating part is that there seems to be no hope for this issue to be resolved in the near future. Now, think about it: if someone is starting a project for mobile platforms today and is choosing a framework, and they find this issue, will they choose .NET if even for such a basic thing as startup time there is no solution?

I am confident that if resources were allocated to address this issue, it wouldn't necessarily require immediate implementation of .NET Native AOT. It would be enough for someone with knowledge of Mono to try to solve this problem https://github.com/dotnet/runtime/issues/101135 and provide an answer: why it really cannot be done or how it can be done and within what timeframe, as far as I understand MonoAOT works for some internal generics but doesn't work for custom generics, maybe fixing it isn't that hard. Currently, all discussions point out that Native AOT for Android is difficult, fixing MonoAOT for Android is difficult, and that the problem lies with developers because everything works fine in tests with an empty application.

jkotas commented 1 week ago

I am currently working on a large project, and for iOS, I am using .NET Native. My library heavily utilizes generics and value types, and I see no issues with this. The installation size from the App Store is 72MB,

Do you happen to have size and startup performance numbers for Mono AOT on iOS for this project? Would the Mono AOT size and startup performance be acceptable on iOS if native AOT did not exist?

alexyakunin commented 1 week ago

I am currently working on a large project, and for iOS, I am using .NET Native. My library heavily utilizes generics and value types, and I see no issues with this.

If a mono architecture limitation prevents full AOT on Android, why does full mono AOT work on IOS?

I'm also curious how it's even possible to run Mono AOT apps on iOS w/o interpreter enabled, assuming it works the same way for both iOS and Android (and our findings for Android are correct).

For the note, we use interpreter-only builds for iOS - that's because our initial attempts w/ full AOT failed there, but that was ~ 1+ year ago, when we knew way less details of how it works + didn't do anything to address the explosion of generic instances in ArgumentList<T0, ... T8> type, etc., and now it's addressed. So we'll definitely retry with AOT builds for iOS quite soon.

alexyakunin commented 1 week ago

@jkotas

Would the Mono AOT size and startup performance be acceptable on iOS if native AOT did not exist?

I know I'm not the one you ask here, but in our case even interpreter-only mode startup performance is acceptable on iOS - and that's exactly what we use now. I shared the numbers, it's about 1.1s on iPhone 13 (interpreter-only) vs 1.8s on Galaxy S23 Ultra (both profiled and full AOT).

vyacheslav-volkov commented 1 week ago

@jkotas MonoAOT performs almost as fast as NativeAOT, but the application size becomes larger. I just checked, and for this project, it's around 108 MB for MonoAOT. This is a reasonable size, as I mentioned earlier; similar applications written in Objective-C/Swift occupy about the same amount of space.

josephmoresena commented 1 week ago

GC bridge (of some form) to support Java interop

NativeAOT runtime packs for Android: this is somewhat working with linux-bionic-arm64 packages, but we probably want actual android-arm64, etc. packages.

In fact a GC bridge for Java intero, .NET (runtime) interop and .NET (NativeAOT) interop would be great.

For some time now, I have been working on a JNI framework for .NET that is fully compatible with NativeAOT, and there is even an example of its use on Android.

However, from what I have observed, native Android applications go beyond JNI; it even seems that all of Java's functionality is encapsulated in Android's own native libraries. For NativeAOT, what might be feasible is to build a framework on top of these native libraries using DirectPInvoke and NativeLibrary.

dotnet / runtime

NativeAOT status for Android #106748