dotnet fsi startup overhead surprisingly high

daz10000 commented 2 years ago

Is your feature request related to a problem? Please describe.

Startup time for fsx files is quite high. It might not matter if you are planning to do a large computation, but for small scripts, it seems to be routinely a few seconds, even for an empty fsx file with just () as code. I ran some benchmarks against similar python programs for comparison on a few machines, [inspired by this comment] (https://twitter.com/vilinski/status/1485188447639851012) For small operating system tasks, running a tiny F# script would incur significant overhead, and it's a small barrier to starting up a quick REPL session/

Some tests I tried

Hello world, windows, dotnet 6.0.101, git bash

$ cat hw.fsx
printfn "hello world"

$ time dotnet fsi hw.fsx
hello world

real    0m2.711s
user    0m0.000s
sys     0m0.031s

daz@DarrenHP MINGW64 /c/src/tmp
$ dotnet --version
6.0.101

For comparison, here is hello world python

$ time python hw.py
hello world

real    0m0.386s
user    0m0.000s
sys     0m0.031s

$ cat hw.py
print("hello world")
$ python --version
Python 3.9.2

It's not even about really running the F# code itself, similar tests get 1.8ish seconds for a completely empty .fsx file. Interesting, adding in just the expression () adds about another half second.

$ cat null.fsx
()

$ time dotnet fsi null.fsx

real    0m2.397s
user    0m0.000s
sys     0m0.030s

$ cat empty.fsx

$ time dotnet fsi empty.fsx

real    0m1.778s
user    0m0.000s
sys     0m0.031s

Again python seems to shrug off having nothing to do very quickly

$ cat null.py

$ time python null.py

real    0m0.162s
user    0m0.000s
sys     0m0.031s

Curiously, and this is hard ti time but if I run fsi without a script associated, and just hit ctrl-D really really quickly, I can be in and out of the reply in about a half-second (even with human reflexes), so there is something about an fsx file specifically which seems to slow things down. I guess that's good if you just want to play interactively

$ time dotnet fsi

Microsoft (R) F# Interactive version 12.0.0.0 for F# 6.0
Copyright (c) Microsoft Corporation. All Rights Reserved.

For help type #help;;

>
real    0m0.603s
user    0m0.000s
sys     0m0.031s

For comparison, on a fairly high end linux AWS instance (m4.4xlarge, 16 vCPU, 64 GiB ram), with dotnet 6.0.101

time ~/.dotnet/dotnet fsi null.fsx

real    0m1.702s
user    0m1.588s
sys     0m0.131s

Describe the solution you'd like

It's totally a qualitative thing, but overhead for commencing execution of an fsx file should comparable to the startup file for the reply (let's say a half second but closer to the python times of 0.2 seconds would be ideal)

Describe alternatives you've considered

This doesn't matter if you are compiling code but for running scripts, especially lots of small scripts back to back, the two seconds would be problematic and making fewer smaller scripts or a different scripting system would be the solution.

Additional context

I did a little profiling with strace and strace -c on a linux machine. I'm not sure this is super informative really - the total reported time in system calls is just 0.035 seconds. I suspect the time is more in userland but I'd have to profile the whole startup to see that bit. Trace output attached. There does seem to be a large pause in the output from strace for a second or so in the middle just watching it interactively but I don't think it's necessarily making a system call at that time.

strace-output.txt strace-c-output.txt

baronfel commented 2 years ago

it might be useful to collect a trace from dotnet fsi using dotnet-trace. dotnet trace collect -- fsi null.fsx should result in a trace.nettrace file that can be viewed either in PerfViewer on windows, or it can be converted using dotnet trace convert to something that can be viewed in speedscope or chromium devtools.

daz10000 commented 2 years ago

I'm sure 'm doing something wrong but didn't get much joy running that specific incantation. This does work though

dotnet-trace.exe collect -- dotnet /c/Program\ Files/dotnet/sdk/6.0.101/FSharp/fsi.dll null.fsx

Microsoft-DotNETCore-SampleProfiler     0x0000F00000000000  Informational(4)    --profile
Microsoft-Windows-DotNETRuntime         0x00000014C14FCCBD  Informational(4)    --profile

Launching: dotnet "C:/Program Files/dotnet/sdk/6.0.101/FSharp/fsi.dll" null.fsx
Process        : C:\Program Files\dotnet\dotnet.exe
Output File    : C:\src\tmp\dotnet.exe_20220123_153331.nettrace

[00:00:00:03]   Recording trace 7.7813   (MB)
Press <Enter> or <Ctrl+C> to exit...

Trace completed.
Process exited with code '0'.

baronfel commented 2 years ago

That's my fault - I forgot that dotnet-trace doesn't automatically provide the dotnet host argument. The correct incantation would be dotnet trace collect -- dotnet fsi hw.fsx.

daz10000 commented 2 years ago

dotnet.exe_20220123_153331.chromium.zip

There is the chromium output, and (FWIW, I'm not an expert at this type of output), here is the pretty picture from perfetto.dev using that output. A lot of stuff happens for that empty fsx file.. An awful lot of it seems to be actually processing the code which seems strange honestly give there is no code to parse or process

kerams commented 2 years ago

Could it be that the overhead simply stems from JITting the compiler (unless it's already ngened)?

On Windows I'm consistently seeing just above 1s for Measure-Command { dotnet fsi .\unit.fsx }. Quite a lot less than your Linux results, but still too much compared to the near-instant evaluation of () in a running FSI session.

dsyme commented 2 years ago

Could it be that the overhead simply stems from JITting the compiler (unless it's already ngened)?

Yes, the primary cost is certainly the JIT of compiler code. However my understanding is the installation of F# should be cross-gen'd. There is a fix coming through .NET 7 to crossgen much more F# code that uses tailcalls. This may be related.

@KevinRansom Do you know how we can check if crossgen'd code is being executed? Thanks

daz10000 commented 2 years ago

I'n no expert on JIT or crossgen, so talk this with a grain of salt but looking at the flame diagram above, that looks like actual compiler code running - I saw lots of time spend in what looked like building compile trees, pickled representations.. That was puzzling because the test code had either zero lines of code, or just the expression (). Is there some other standard code that would get compiled in the background? Wouldn't the JIT be happening in the background, or are those deep call trees in the profiling the JIT itself?

On the crossgen front, can I just run ngen over something like a linux install and check the runtime to verify it either doesn't make a difference or it does, and that's likely the problem? Would I do that over just fsi.dll, fsc.dll or are there other things I'd want to hit as well (I might just try the experiment, seems to be a quick question to answer).

KevinRansom commented 2 years ago

@dsyme an rtm install of the dotnetsdk contains crossgened assmeblies. So they are always executed crossgened. For code built from the F# repo crossgen is not applied.

The likely big cost in startup is preloading the framework assemblies.

nodakai commented 1 year ago

May I suggest F# team to run a .fsx script as part of $PS1 as dogfooding 😉

vzarytovskii commented 1 year ago

Things should improve once we start incorporating mibc profiles, and ship stripped dlls. But definitely should look at fixing this.

benjamin-thomas commented 1 year ago

As a beginner learning the language, I would love to have a snappy experience developing and running fsharp scripts. Fsharp could then be a serious option to replace all my system scripting needs.

I don't know anything about the internals, but I thought I could suggest you to look at this Groovy related project: https://kobo.github.io/groovyserv

This approach could be one way to achieve snappy feedback.

cartermp commented 1 year ago

This issue comes up from time to time and it's just that...FSI was never designed for fast startup times. It's an interactive process meant to be start up once, at development time, and can accept arbitrary F# code at any time to evaluate. That doesn't meant it can't change, but this isn't just a matter of "fixing" something. Changing FSI to be designed to start up as quickly as possible so it can quickly evaluate an F# script is a fundamentally different problem to solve than what it does today.

vzarytovskii commented 1 year ago

Using mibc should help a bit, hopefully gonna enable it when Drew is back

KevinRansom commented 1 year ago

My suspicion is that we pre-load a ton of assemblies before compilation. We only ship native compiled assemblies to the dotnet sdk, so that start up time is for native compiled. Any jitting that happens is for those constructs that the coreclr doesn't yet pre-compile.

daz10000 commented 1 year ago

Would it be possible to pre optimize anything at install time ? dotnet fsi is something you end up running a lot interactively DarrenAm 6/9/23 um 3:00 PM schrieb Kevin Ransom (msft) @.***>: My suspicion is that we pre-load a ton of assemblies before compilation. We only ship native compiled assemblies to the dotnet sdk, so that start up time is for native compiled. Any jitting that happens is for those constructs that the coreclr doesn't yet pre-compile.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

vzarytovskii commented 1 year ago

Would it be possible to pre optimize anything at install time ? dotnet fsi is something you end up running a lot interactively DarrenAm 6/9/23 um 3:00 PM schrieb Kevin Ransom (msft) @.***>: My suspicion is that we pre-load a ton of assemblies before compilation. We only ship native compiled assemblies to the dotnet sdk, so that start up time is for native compiled. Any jitting that happens is for those constructs that the coreclr doesn't yet pre-compile.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

This and having a (pre-computed jit) profile is something we're looking into.

rbauduin commented 1 year ago

This issue comes up from time to time and it's just that...FSI was never designed for fast startup times. It's an interactive process meant to be start up once, at development time, and can accept arbitrary F# code at any time to evaluate. That doesn't meant it can't change, but this isn't just a matter of "fixing" something. Changing FSI to be designed to start up as quickly as possible so it can quickly evaluate an F# script is a fundamentally different problem to solve than what it does today.

Wouldn't it make sense to have a dedicated tool for running F# scripts then?

vzarytovskii commented 1 month ago

Don't think it should be closed just yet, we enabled pgo for compiler, need to do the same for fsi.

daz10000 commented 1 month ago

really excited to see this get attention. I have been doing a lot more FSI work as the nuget option for package reference became available and use it for a lot of small tools so this would be a huge quality of life improvement

dotnet / fsharp

dotnet fsi startup overhead surprisingly high #12636

Hello world, windows, dotnet 6.0.101, git bash