Open epsitec opened 7 years ago
@SamLebarbare you might be interested in this topic too.
Note: even if I use ContinueWithAsync()
I do not get reasonable execution times (about 15 to 20 milliseconds).
I'd like to pay the startup time only once, and be able to then execute my snippets one after the other,
Did you do that for your testing? i.e. CSharpScript.Create
once and repeatedly call await script1.RunAsync
as in your first example? Because if you create new scripts inside the loop then you are not measuring what you want to do ...
I'm asking because when I do the same thing you do (repeatedly executing a script returning "42") I only get execution times of 0.005 milliseconds on average. (release build, no debugger attached)
(Note: I'm not related to the project, just was coming across the issue and was wondering if those numbers made sense, so I tried it myself and couldn't reproduce them.)
I'd like to pay the startup time only once, and be able to then execute my snippets one after the other, getting back the results as fast as possible.
Sounds like you're looking for: https://github.com/dotnet/roslyn/wiki/Scripting-API-Samples#delegate
BTW, do you ngen Roslyn when your measure the times?
I have tried the call to CreateDelegate()
- the issue is that it is slow too (also about 70ms to 100ms) so calling hundreds of times CreateDelegate()
to compile all the code snippets is just not a viable solution.
And no, I did nothing fancy. Add NuGet package to solution, build, run.
then why don't you merge all your little code snippets into one class and compile that?
like, put everything into one big stringbuilder and in the end you just compile something like
class AllFunctions { public Snippet1() { ... } public Snippet2() { ... } public Snippet3() { ... } }
Yes, indeed, merging all the code snippets into one piece would be a possibility; but that would require knowing upfront what code snippets I'd have to compile.
@epsitec Have you profiled the code to find out what parts of the evaluation are the slowest?
@tmat you mean, profiled the Roslyn code?
Your scenario end-to-end, including Roslyn code.
I simply timed the execution of the calls to CSharpScript
and got values between 50ms and 100ms. I suppose that profiling Roslyn to get more meaningful results would require me to build all NuGet packages from the source code. I've never done that before.
Just try it and let us know what problems you encounter.
Or try the demo version of "dot Trace Performance", i think you don't need any code with that tool. It will show the function names and their timings, then we can easily see whats happening and how to optimize it.
Post your results here
I suppose that profiling Roslyn to get more meaningful results would require me to build all NuGet packages from the source code.
You don't need to build .NET assemblies from source to do profiling. Thats why we have metadata and reflection. (You may need to disable "just my code" if you use Visual Studio for profiling. Other tools may have similar settings.)
I'd like to pay the startup time only once, and be able to then execute my snippets one after the other,
I simply timed the execution of the calls to CSharpScript and got values between 50ms and 100ms.
You claim you don't care about paying the startup time (costs for compiling) only once but then proceed measuring exactly that time in your loop? makes no sense.
Create the scripts before you measure the execution. See the linked code from my first post what I tried to measure (and getting several magnitudes faster execution times).
If you can't do that and have to compile scripts "ad hoc" you may want to cache them so if you repeatedly need the same script you don't have to recompile (don't know if Roslyn does any such caching internally).
You can execute a script a second time and it will reuse the code it compiled for the first execution.
Here is what I see. Want me do dig deeper, or just share the source code?
show us the code as well, results look strange...
I think this is https://github.com/dotnet/roslyn/issues/5517
@asdfgasdfsafgsdfa I have updated this repository:
https://github.com/epsitec/rnd-csharpscript-netstandard
to include the code I used to do the measurements above.
in every Compile call you are creating a new script: CSharpScript.Create
after you created a state, continue using it instead of recreating everything it has cached for you, from the example documentation:
state = await state.ContinueWithAsync("int y = 2;");
you would do something like "return someValue;"
another thing you can do (if you always access the same thing) actually caching the retrieval of the variable itself like this:
Func<decimal> myGetter = await state.ContinueWithAsync("return () => someValueProperty");
and then just keep using myGetter()
I did exactly that and it worked perfectly, same the same performance as invoking a normal function, absolutely awesome for what I'm doing with roslyn :)
@asdfgasdfsafgsdfa I tried what you suggested and updated my sample project accordingly.
See Program.TimeIt2()
which calls into StatefulScriptCompiler
class, defined as:
public class StatefulScriptCompiler<T>
{
public void Setup(object globals)
{
var options = ScriptOptions.Default;
this.script = CSharpScript.Create<T> ($"default ({typeof (T).FullName})", options: options, globalsType: globals.GetType ());
var runner = this.script.CreateDelegate ();
}
public System.Func<T> Compile(string source, object globals)
{
this.script = this.script.ContinueWith<T> ($"return {source};");
var runner = script.CreateDelegate ();
return () => runner (globals).Result;
}
private Script<T> script;
}
With this in place, the time to compile the same script goes down from 75ms to 20ms which is a big improvement, but still considerably slower than what I'd love to see in CSharpScript
.
I tested that way as well and I agree. I don't know why it is that slow.
I only managed to avoid the slow compilation time by merging all my code into one big submission to the script instance. In my experience adding more code is essentially free (in time, the eval/compile/run process doesn't take longer), but setting up / starting a compilation always seems to take some initial time even when reusing.
If its only adding small pieces of code, I was hoping for <1ms, many other parsers and "interpreters" (that really compile c# down to LinqExpressions or DynamicMethods) don't exceed a millisecond when it comes to compiling short fragments.
But of course I realize that roslyn is MUCH more powerful, it supports everything, not just a small subset.
But I think there is still a lot of room for optimizations. Pretty sure it's possible that roslyn itself can be optimized so we get below 2 ms.
Maybe someone from the team could run a quick test and at least tell us what roslyn is doing/why that causes it to be so slow. There's also a ~3second startup time on the first use... that can't be normal, right?
There's also a ~3second startup time on the first use... that can't be normal, right?
Loading DLLs and jitting the codepaths?
Yes that would play a part in contributing to the time most likely. But I doubt that's normal, I don't have a slow computer. 3 seconds for just loading the dll(s)? That definitely is excessive, something else must be going on for sure.
@VSadov any idea of what might be going on in the sample code provided here and why Roslyn is unexpectedly slow?
@epsitec As I mentioned, based on your profile I think this is caused by #5517. The reference binding is slow and not being reused across scripts.
This is what I get running a simple test in a VM with latest RC4 Rolyn script bits. (2.4ghz Xeon E5-2673)
using System;
using System.Diagnostics;
using Microsoft.CodeAnalysis.CSharp.Scripting;
namespace Scripto
{
class Program
{
static int iterations = 100;
static void Main(string[] args)
{
var code = "42";
var test0_ms = Test(1, () => CSharpScript.RunAsync(code).Wait(), warm: false);
Console.WriteLine($"{test0_ms} : Run first script.");
var test1_ms = Test(iterations, () => CSharpScript.RunAsync(code).Wait(), warm: true);
Console.WriteLine($"{test1_ms} : Run new script (warm).");
var baseScript = CSharpScript.Create("");
var test3_ms = Test(iterations, () => baseScript.ContinueWith(code).RunAsync().Wait(), warm: true);
Console.WriteLine($"{test3_ms} : Run separate script continued new from shared script (warm).");
var test2Script = CSharpScript.Create(code);
var test2_ms = Test(iterations, () => test2Script.RunAsync().Wait(), warm: true);
Console.WriteLine($"{test2_ms} : Run same script over and over again (warm).");
}
static long Test(int iterations, Action testAction, bool warm)
{
if (warm)
{
testAction();
}
var timer = new Stopwatch();
timer.Start();
for (int i = 0; i < iterations; i++)
{
testAction();
}
timer.Stop();
return timer.ElapsedMilliseconds / iterations;
}
}
}
Running test outputs:
2048 : Run first script.
6 : Run new script (warm).
2 : Run separate script continued new from shared script (warm).
0 : Run same script over and over again (warm).
So first run is approx. 2 seconds. Subsequent compile & runs is 6 ms. If you base your separate runs of the same base script it gets faster because of shared metadata context. Rerunning same script over and over again is less than 1 ms. This is because the delegate to the generate code is cached and reused when you execute again.
can you do the same for code = "return (Func<object, string>)(x => x.ToString() + \"123\")"
?
just curious, you don't have to execute the func that gets returned
@asdfgasdfsafgsdfa sure..
This is the modified test:
using System;
using System.Diagnostics;
using Microsoft.CodeAnalysis.Scripting;
using Microsoft.CodeAnalysis.CSharp.Scripting;
namespace Scripto
{
class Program
{
static void Main(string[] args)
{
var iterations = 100;
var code = @"return (Func<object, string>)(x => x.ToString() + ""123"");";
var options = ScriptOptions.Default.AddImports("System");
Test("Run first script", 1, () => CSharpScript.RunAsync(code, options).Wait(), warm: false);
Test("Run new script", iterations, () => CSharpScript.RunAsync(code, options).Wait(), warm: true);
var baseScript = CSharpScript.Create("", options);
Test("Run separate script continued new from shared script", iterations, () => baseScript.ContinueWith(code).RunAsync().Wait(), warm: true);
var sharedScript = CSharpScript.Create(code, options);
Test("Run same script over and over again", iterations, () => sharedScript.RunAsync().Wait(), warm: true);
}
static void Test(string title, int iterations, Action testAction, bool warm)
{
if (warm)
{
testAction();
}
var timer = new Stopwatch();
timer.Start();
for (int i = 0; i < iterations; i++)
{
testAction();
}
timer.Stop();
var time = timer.ElapsedMilliseconds / iterations;
Console.WriteLine($@"{time} : {title}{(warm ? " (warm)" : "")}");
}
}
}
And here is the timing results:
2317 : Run first script
7 : Run new script (warm)
2 : Run separate script continued new from shared script (warm)
0 : Run same script over and over again (warm)
Lesson to learn. If you are going to run a lot of small disjoint scripts, you should have them all extend a common empty script.
If you want to hide the start-up time of the first compile & run script then you should execute the empty script async on a background thread (maybe Task.Run), and then depending on your app it is likely this work is already done by the time you execute the first real script that continues on from it.
Here is a simple command line REPL. With precomputing the first script asynchronously, you probably won't notice any delay by the time you issue your first request.
using System;
using Microsoft.CodeAnalysis.Scripting;
using Microsoft.CodeAnalysis.CSharp.Scripting;
using System.Threading.Tasks;
namespace Scripto
{
class Program
{
static void Main(string[] args)
{
RunReplAsync().Wait();
}
static async Task RunReplAsync()
{
var options = ScriptOptions.Default.AddImports("System");
var firstResult = Task.Run(() => CSharpScript.RunAsync("1 + 2", options)); // do simple nonsense work
ScriptState lastResult = null;
while (true)
{
Console.Write("> ");
var code = Console.ReadLine();
if (lastResult == null)
{
lastResult = await firstResult.ConfigureAwait(false);
}
lastResult = await lastResult.ContinueWithAsync(code, options).ConfigureAwait(false);
if (lastResult.ReturnValue != null)
{
Console.WriteLine(lastResult.ReturnValue);
}
}
}
}
}
@tmat can you clarify what you mean by based on your profile and why this relates to #5517 ?
Following code, all executed inside .NET 4.6.2 program, with no mixing of .NET Standard libraries, and just using ScriptOptions.Default
is still slow:
public class LocalStatefulScriptCompiler<T>
{
public void Setup(object globals)
{
var options = ScriptOptions.Default;
this.script = CSharpScript.Create<T> ($"default ({typeof (T).FullName})", options: options, globalsType: globals.GetType ());
var runner = this.script.CreateDelegate ();
}
public System.Func<T> Compile(string source, object globals)
{
this.script = this.script.ContinueWith<T> ($"return {source};");
var runner = script.CreateDelegate ();
return () => runner (globals).Result;
}
private Script<T> script;
}
Calling into it with:
var locals = new Locals ();
var compiler = new LocalStatefulScriptCompiler<decimal> ();
compiler.Setup (locals);
// Every call to compiler.Compile() takes 17ms
var func = compiler.Compile ("42m", locals);
@epsitec
These numbers suggest a lot of time spent in ReferenceManager:
I'm also seeing 2 seconds startup time in Debug, and worse, 5 seconds startup time in release, which seems excessive. Subsequent compilation is fast, but anything that can be done to reduce the startup overhead? Should this be tracked in a separate issue? I'm fine with the compilation speed of individual scripts once it's warm, but initial startup time is a pain.
I wonder if this contributes to the ~5 second time till code runs in Cake? It has been getting on my nerves but I can't blame them for it if it's a Roslyn thing. It prevents you from being able to build sensible tab completion because who wants powershell to freeze for six seconds while compiling the script to obtain the tab completion options?
Something is definitely up because I measure 2-6 seconds for the initial compile; but csi.exe
starts in <1 sec for the first evaluation. Either the "system" Roslyn is nGen'd (the one in "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\Roslyn\csi.exe") and that explains it, or they're doing something smarter with regards to script options, or performance regressed since then.
Tried to ngen
but it doesn't seem to have any impact. At best, 0.5 seconds, so 4.3 -> 3.8 seconds in Release mode here. Any ideas what else to try?
For fun, I compiled Roslyn's csi.exe
from scratch using this Git commit: https://github.com/dotnet/roslyn/commit/57c813192bd494bb15aaa841f415f28f50438e19 (this is the commit used for the 2.3.1 NuGet package.)
That one also takes 3-4 seconds to evaluate the first expression, unlike my "system" csi.exe
which is fast as explained above. So either this degraded between whatever was used to create the bundled csi.exe
and 2.3.1, or something was done to that csi.exe
to make it fast. (I also tried master as of a couple of minutes ago, and it looks like that's a tad slower than the official 2.3.1 release.
Most of the first time cost is the JIT compiler processing all the assemblies and methods used by calling the API's.
The reason csi.exe runs faster is that .net assembliies and exes installed by VS are typically pre-jitted by ngen.exe at install time. It is likely that if you ngen the csi.exe you compiled from source you'll get a faster start up too.
@anteru did you ngen all the assemblies?
@mattwar I tried to ngen both my executable (which is basically a call to load-script + execute), as well as using ngen on the scripting assembly. Didn't help nearly as much as I expected.
@anteru I think you'd have to ngen all of roslyn, etc, all assemblies brought in by the package.
@mattwar With ngen could I expect the startup time of cold compiling a Cake script to be sub-100 ms?
@jnm2 I have no idea.
The csc.exe command line compiler has a process that starts up to amortize costs loading referenced assembly metadata to speed up msbuild. That can make c# compilation appear to be faster when its not really a cold start. csi.exe does not use this. Cake could be engineered to do something similar, but not for the first compile.
@mattwar Just to be clear, I need to compile everything Roslyn pulls in, and my executable as well, or should it be fast without using ngen on my executable on top of that? For development, it's certainly nicer if I can just ngen Roslyn once on every package update.
@anteru I don't think the roslyn nuget package ngens when installed. Maybe @jaredpar knows otherwise.
@mattwar Rather than keeping a background process running, what if Roslyn could cache all the lookup information on disk so that subsequent calls would be able to do minimal validation and use as much of the cache as possible?
@jnm2 There is already a cache on disk, its the reference assembly.
@jnm2 There is already a cache on disk, its the reference assembly.
So the startup cost is in loading the reference assemblies into memory, not in processing that information into an intermediate data structure?
@jnm2 there is processing on top of the loading, by construction roslyn symbols from the metadata. Since the roslyn symbols are allocated on the heap, there is no flat binary representation of them other than the metadata format itself, which already exists on disk.
It's not exactly the same, but goes into performance too.
When running in x86 mode, for each EvaluateAsync
call I get a 100MB peak of unmanaged memory consumed. It's not a leak: as soon as GC kicks in that memory is reclaimed.
However, let's say I have many scripts to execute immediately and one after the other. Then I easily get an OutOfMemoryException. This issue mainly happens when running in x86.
Is there a way to avoid these memory peaks? I can understand 1 peak, but why does it cost so much all the time?
Sample to reproduce the issue, it happens after a couple of iterations: https://gist.github.com/molinch/7eb0005141eda546fbfffbfe2292e4ce
I am experimenting with version
2.0.0-rc3
, running on Windows 7 x64 with .NET Framework 4.6.2.When running very simple scripts such as these two:
or
I consistently get execution times which lie between 70 and 100 milliseconds. I understand there is a cost to compile the script, however I cannot use
CSharpScript
with such execution times, as I need to evaluate hundreds of little code snippets.What is the recommended approach if I need to evaluate lots of small pieces of C# - I'd like to pay the startup time only once, and be able to then execute my snippets one after the other, getting back the results as fast as possible.