Open daz10000 opened 2 years ago
it might be useful to collect a trace from dotnet fsi using dotnet-trace. dotnet trace collect -- fsi null.fsx
should result in a trace.nettrace
file that can be viewed either in PerfViewer on windows, or it can be converted using dotnet trace convert
to something that can be viewed in speedscope or chromium devtools.
I'm sure 'm doing something wrong but didn't get much joy running that specific incantation. This does work though
dotnet-trace.exe collect -- dotnet /c/Program\ Files/dotnet/sdk/6.0.101/FSharp/fsi.dll null.fsx
Microsoft-DotNETCore-SampleProfiler 0x0000F00000000000 Informational(4) --profile
Microsoft-Windows-DotNETRuntime 0x00000014C14FCCBD Informational(4) --profile
Launching: dotnet "C:/Program Files/dotnet/sdk/6.0.101/FSharp/fsi.dll" null.fsx
Process : C:\Program Files\dotnet\dotnet.exe
Output File : C:\src\tmp\dotnet.exe_20220123_153331.nettrace
[00:00:00:03] Recording trace 7.7813 (MB)
Press <Enter> or <Ctrl+C> to exit...
Trace completed.
Process exited with code '0'.
That's my fault - I forgot that dotnet-trace
doesn't automatically provide the dotnet
host argument. The correct incantation would be dotnet trace collect -- dotnet fsi hw.fsx
.
dotnet.exe_20220123_153331.chromium.zip
There is the chromium output, and (FWIW, I'm not an expert at this type of output), here is the pretty picture from perfetto.dev using that output. A lot of stuff happens for that empty fsx file.. An awful lot of it seems to be actually processing the code which seems strange honestly give there is no code to parse or process
Could it be that the overhead simply stems from JITting the compiler (unless it's already ngened)?
On Windows I'm consistently seeing just above 1s for Measure-Command { dotnet fsi .\unit.fsx }
. Quite a lot less than your Linux results, but still too much compared to the near-instant evaluation of ()
in a running FSI session.
Could it be that the overhead simply stems from JITting the compiler (unless it's already ngened)?
Yes, the primary cost is certainly the JIT of compiler code. However my understanding is the installation of F# should be cross-gen'd. There is a fix coming through .NET 7 to crossgen much more F# code that uses tailcalls. This may be related.
@KevinRansom Do you know how we can check if crossgen'd code is being executed? Thanks
I'n no expert on JIT or crossgen, so talk this with a grain of salt but looking at the flame diagram above, that looks like actual compiler code running - I saw lots of time spend in what looked like building compile trees, pickled representations.. That was puzzling because the test code had either zero lines of code, or just the expression ()
. Is there some other standard code that would get compiled in the background? Wouldn't the JIT be happening in the background, or are those deep call trees in the profiling the JIT itself?
On the crossgen front, can I just run ngen
over something like a linux install and check the runtime to verify it either doesn't make a difference or it does, and that's likely the problem? Would I do that over just fsi.dll
, fsc.dll
or are there other things I'd want to hit as well (I might just try the experiment, seems to be a quick question to answer).
@dsyme an rtm install of the dotnetsdk contains crossgened assmeblies. So they are always executed crossgened. For code built from the F# repo crossgen is not applied.
The likely big cost in startup is preloading the framework assemblies.
May I suggest F# team to run a .fsx script as part of $PS1
as dogfooding 😉
Things should improve once we start incorporating mibc profiles, and ship stripped dlls. But definitely should look at fixing this.
As a beginner learning the language, I would love to have a snappy experience developing and running fsharp scripts. Fsharp could then be a serious option to replace all my system scripting needs.
I don't know anything about the internals, but I thought I could suggest you to look at this Groovy related project: https://kobo.github.io/groovyserv
This approach could be one way to achieve snappy feedback.
This issue comes up from time to time and it's just that...FSI was never designed for fast startup times. It's an interactive process meant to be start up once, at development time, and can accept arbitrary F# code at any time to evaluate. That doesn't meant it can't change, but this isn't just a matter of "fixing" something. Changing FSI to be designed to start up as quickly as possible so it can quickly evaluate an F# script is a fundamentally different problem to solve than what it does today.
Using mibc should help a bit, hopefully gonna enable it when Drew is back
My suspicion is that we pre-load a ton of assemblies before compilation. We only ship native compiled assemblies to the dotnet sdk, so that start up time is for native compiled. Any jitting that happens is for those constructs that the coreclr doesn't yet pre-compile.
Would it be possible to pre optimize anything at install time ?  dotnet fsi is something you end up running a lot interactively DarrenAm 6/9/23 um 3:00 PM schrieb Kevin Ransom (msft) @.***>: My suspicion is that we pre-load a ton of assemblies before compilation. We only ship native compiled assemblies to the dotnet sdk, so that start up time is for native compiled. Any jitting that happens is for those constructs that the coreclr doesn't yet pre-compile.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
Would it be possible to pre optimize anything at install time ?  dotnet fsi is something you end up running a lot interactively DarrenAm 6/9/23 um 3:00 PM schrieb Kevin Ransom (msft) @.***>: My suspicion is that we pre-load a ton of assemblies before compilation. We only ship native compiled assemblies to the dotnet sdk, so that start up time is for native compiled. Any jitting that happens is for those constructs that the coreclr doesn't yet pre-compile.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
This and having a (pre-computed jit) profile is something we're looking into.
This issue comes up from time to time and it's just that...FSI was never designed for fast startup times. It's an interactive process meant to be start up once, at development time, and can accept arbitrary F# code at any time to evaluate. That doesn't meant it can't change, but this isn't just a matter of "fixing" something. Changing FSI to be designed to start up as quickly as possible so it can quickly evaluate an F# script is a fundamentally different problem to solve than what it does today.
Wouldn't it make sense to have a dedicated tool for running F# scripts then?
Don't think it should be closed just yet, we enabled pgo for compiler, need to do the same for fsi.
really excited to see this get attention. I have been doing a lot more FSI work as the nuget option for package reference became available and use it for a lot of small tools so this would be a huge quality of life improvement
Is your feature request related to a problem? Please describe.
Startup time for
fsx
files is quite high. It might not matter if you are planning to do a large computation, but for small scripts, it seems to be routinely a few seconds, even for an empty fsx file with just()
as code. I ran some benchmarks against similar python programs for comparison on a few machines, [inspired by this comment] (https://twitter.com/vilinski/status/1485188447639851012) For small operating system tasks, running a tiny F# script would incur significant overhead, and it's a small barrier to starting up a quick REPL session/Some tests I tried
Hello world, windows, dotnet 6.0.101, git bash
For comparison, here is hello world python
It's not even about really running the F# code itself, similar tests get 1.8ish seconds for a completely empty .fsx file. Interesting, adding in just the expression
()
adds about another half second.Again python seems to shrug off having nothing to do very quickly
Curiously, and this is hard ti time but if I run fsi without a script associated, and just hit ctrl-D really really quickly, I can be in and out of the reply in about a half-second (even with human reflexes), so there is something about an fsx file specifically which seems to slow things down. I guess that's good if you just want to play interactively
For comparison, on a fairly high end linux AWS instance (m4.4xlarge, 16 vCPU, 64 GiB ram), with dotnet 6.0.101
Describe the solution you'd like
It's totally a qualitative thing, but overhead for commencing execution of an fsx file should comparable to the startup file for the reply (let's say a half second but closer to the python times of 0.2 seconds would be ideal)
Describe alternatives you've considered
This doesn't matter if you are compiling code but for running scripts, especially lots of small scripts back to back, the two seconds would be problematic and making fewer smaller scripts or a different scripting system would be the solution.
Additional context
I did a little profiling with
strace
andstrace -c
on a linux machine. I'm not sure this is super informative really - the total reported time in system calls is just 0.035 seconds. I suspect the time is more in userland but I'd have to profile the whole startup to see that bit. Trace output attached. There does seem to be a large pause in the output from strace for a second or so in the middle just watching it interactively but I don't think it's necessarily making a system call at that time.strace-output.txt strace-c-output.txt