dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.16k stars 4.71k forks source link

[ARM] JIT Compile time is too slower than mono(4.4.2) #6589

Open sjsinju opened 8 years ago

sjsinju commented 8 years ago

I did run a basic application that just printing 'Hello World' with console by mono and the latest commit of .NET Core(coreclr) in a ARM device. The execute time of mono is

sh-3.2# time mono /tmp/tests/mono/hello.exe
Hello World
real    0m0.274s
user    0m0.230s
sys 0m0.030s

and .NET Core(build with O3) is

sh-3.2# time ./corerun /tmp/tests/hello.exe 
HI Hello, World!
real    0m0.625s
user    0m0.550s
sys 0m0.060s

As the above results, we can figure out the execute time on .NET Core(625ms) is too slower than mono(274ms). So I did profile for why the execute time on .NET Core is too slow. And I figured out the JIT compile time has the most of the time while running the app on .NET Core.

Because 'Hello world' application use 'mscorlib' in many parts, I did run the app. with NI 'mscorlib' images generated from each of NI generator (mono : mscorlib.dll.so , .NET Core : System.Private.Corelib.ni.dll) The result of mono is

sh-3.2# time mono /tmp/tests/mono/hello.exe
Hello World
real    0m0.109s
user    0m0.090s
sys 0m0.010s

and .NET Core is

sh-3.2# time ./corerun /tmp/tests/hello.exe
HI Hello, World!
real    0m0.099s
user    0m0.080s
sys 0m0.000s

As I expected, It made similar results in running environment without JIT compiling. So I tried to do one more test. Because I thought the one of reasons caused slow JIT compile is optimizing, I did run the application with disabling optimize option 'export COMPlus_JITMinOpts=1'. So I got the below result.

sh-3.2# export COMPlus_JITMinOpts=1
sh-3.2# time ./corerun /tmp/tests/hello.exe
HI Hello, World!
real 0m0.451s
user 0m0.380s
sys 0m0.050s

But the result is slower than mono(273ms) still. For launching a 'hello world' application, 312 functions of JIT compile is needed. If an application that would be developed grows larger size and then would need more JIT compiles. I think it will have more slow launching time and execute time.

I am wondering why JIT compile processes are slow and the way to improve the performance of JIT compile.

category:throughput theme:throughput skill-level:expert cost:extra-large

wateret commented 8 years ago

/cc @dotnet/jit-contrib

jkotas commented 8 years ago

312 functions of JIT compile is needed

How many methods does mono need to compile?

The VM implementation has been shifting from C++ to C# over the years (CoreRT pushed it to extreme). Writing VM in C# results in more reliable and sometimes even faster code because of there is no need to manually manage object references and transitions in C#. The side-effect is that there is quite a bit of managed code running during startup. It is not something we worried about because of all shipping configurations have native image for System.Private.CoreLib. The system is optimize for this case.

sjsinju commented 8 years ago

@jkotas There is the result of mono profiler running the same hello application on mono.

Mono JIT:
        Methods from AOT               : 0
        Methods JITted using mono JIT  : 369
        Methods JITted using LLVM      : 0
        Total time spent JITting (sec) : 0.229377

Methods JITted are 369. but it is faster(270ms) than .NET (625ms or 451ms) I said above. I am raising the issue for JIT compile time per function. This affects the launching time, executing time and so on. Especially we can see the large difference on ARM device.

JensNordenbro commented 7 years ago

@sjsinju, how do you generate .ni images?

BruceForstall commented 7 years ago

You can see that we have opened quite a few issues related to JIT throughput by checking the "JitThroughput" tag, e.g.: https://github.com/dotnet/coreclr/issues?utf8=%E2%9C%93&q=is%3Aopen%20is%3Aissue%20label%3AJitThroughput. There has been quite a lot of thought given to how we can improve this, though we haven't implemented very many improvements yet.

TIHan commented 11 months ago

Would be interesting to re-evaluate this now on ARM64 since there has been a lot of work done with throughput since 2017.