dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.04k stars 4.03k forks source link

CSharpScript speed (or lack of) #16897

Open epsitec opened 7 years ago

epsitec commented 7 years ago

I am experimenting with version 2.0.0-rc3, running on Windows 7 x64 with .NET Framework 4.6.2.

When running very simple scripts such as these two:

var script = CSharpScript.Create<int> ("42", options: options, globalsType: typeof (Record));
var result = await script1.RunAsync (new Record ());

or

var result = await CSharpScript.EvaluateAsync ("42", globals: new Record ());

I consistently get execution times which lie between 70 and 100 milliseconds. I understand there is a cost to compile the script, however I cannot use CSharpScript with such execution times, as I need to evaluate hundreds of little code snippets.

What is the recommended approach if I need to evaluate lots of small pieces of C# - I'd like to pay the startup time only once, and be able to then execute my snippets one after the other, getting back the results as fast as possible.

epsitec commented 7 years ago

@SamLebarbare you might be interested in this topic too.

epsitec commented 7 years ago

Note: even if I use ContinueWithAsync() I do not get reasonable execution times (about 15 to 20 milliseconds).

weltkante commented 7 years ago

I'd like to pay the startup time only once, and be able to then execute my snippets one after the other,

Did you do that for your testing? i.e. CSharpScript.Create once and repeatedly call await script1.RunAsync as in your first example? Because if you create new scripts inside the loop then you are not measuring what you want to do ...

I'm asking because when I do the same thing you do (repeatedly executing a script returning "42") I only get execution times of 0.005 milliseconds on average. (release build, no debugger attached)

(Note: I'm not related to the project, just was coming across the issue and was wondering if those numbers made sense, so I tried it myself and couldn't reproduce them.)

tmat commented 7 years ago

I'd like to pay the startup time only once, and be able to then execute my snippets one after the other, getting back the results as fast as possible.

Sounds like you're looking for: https://github.com/dotnet/roslyn/wiki/Scripting-API-Samples#delegate

BTW, do you ngen Roslyn when your measure the times?

epsitec commented 7 years ago

I have tried the call to CreateDelegate() - the issue is that it is slow too (also about 70ms to 100ms) so calling hundreds of times CreateDelegate() to compile all the code snippets is just not a viable solution.

And no, I did nothing fancy. Add NuGet package to solution, build, run.

asdfgasdfsafgsdfa commented 7 years ago

then why don't you merge all your little code snippets into one class and compile that?

like, put everything into one big stringbuilder and in the end you just compile something like

class AllFunctions { public Snippet1() { ... } public Snippet2() { ... } public Snippet3() { ... } }

epsitec commented 7 years ago

Yes, indeed, merging all the code snippets into one piece would be a possibility; but that would require knowing upfront what code snippets I'd have to compile.

tmat commented 7 years ago

@epsitec Have you profiled the code to find out what parts of the evaluation are the slowest?

epsitec commented 7 years ago

@tmat you mean, profiled the Roslyn code?

tmat commented 7 years ago

Your scenario end-to-end, including Roslyn code.

epsitec commented 7 years ago

I simply timed the execution of the calls to CSharpScript and got values between 50ms and 100ms. I suppose that profiling Roslyn to get more meaningful results would require me to build all NuGet packages from the source code. I've never done that before.

asdfgasdfsafgsdfa commented 7 years ago

Just try it and let us know what problems you encounter.

Or try the demo version of "dot Trace Performance", i think you don't need any code with that tool. It will show the function names and their timings, then we can easily see whats happening and how to optimize it.

Post your results here

weltkante commented 7 years ago

I suppose that profiling Roslyn to get more meaningful results would require me to build all NuGet packages from the source code.

You don't need to build .NET assemblies from source to do profiling. Thats why we have metadata and reflection. (You may need to disable "just my code" if you use Visual Studio for profiling. Other tools may have similar settings.)

I'd like to pay the startup time only once, and be able to then execute my snippets one after the other,

I simply timed the execution of the calls to CSharpScript and got values between 50ms and 100ms.

You claim you don't care about paying the startup time (costs for compiling) only once but then proceed measuring exactly that time in your loop? makes no sense.

Create the scripts before you measure the execution. See the linked code from my first post what I tried to measure (and getting several magnitudes faster execution times).

If you can't do that and have to compile scripts "ad hoc" you may want to cache them so if you repeatedly need the same script you don't have to recompile (don't know if Roslyn does any such caching internally).

mattwar commented 7 years ago

You can execute a script a second time and it will reuse the code it compiled for the first execution.

epsitec commented 7 years ago

Here is what I see. Want me do dig deeper, or just share the source code?

image

asdfgasdfsafgsdfa commented 7 years ago

show us the code as well, results look strange...

tmat commented 7 years ago

I think this is https://github.com/dotnet/roslyn/issues/5517

epsitec commented 7 years ago

@asdfgasdfsafgsdfa I have updated this repository:

https://github.com/epsitec/rnd-csharpscript-netstandard

to include the code I used to do the measurements above.

asdfgasdfsafgsdfa commented 7 years ago

in every Compile call you are creating a new script: CSharpScript.Create

after you created a state, continue using it instead of recreating everything it has cached for you, from the example documentation:

state = await state.ContinueWithAsync("int y = 2;"); 

you would do something like "return someValue;"

another thing you can do (if you always access the same thing) actually caching the retrieval of the variable itself like this:

Func<decimal> myGetter = await state.ContinueWithAsync("return () => someValueProperty");

and then just keep using myGetter()

I did exactly that and it worked perfectly, same the same performance as invoking a normal function, absolutely awesome for what I'm doing with roslyn :)

epsitec commented 7 years ago

@asdfgasdfsafgsdfa I tried what you suggested and updated my sample project accordingly.

See Program.TimeIt2() which calls into StatefulScriptCompiler class, defined as:

public class StatefulScriptCompiler<T>
{
    public void Setup(object globals)
    {
        var options = ScriptOptions.Default;
        this.script = CSharpScript.Create<T> ($"default ({typeof (T).FullName})", options: options, globalsType: globals.GetType ());
        var runner = this.script.CreateDelegate ();
    }

    public System.Func<T> Compile(string source, object globals)
    {
        this.script = this.script.ContinueWith<T> ($"return {source};");
        var runner = script.CreateDelegate ();
        return () => runner (globals).Result;
    }

    private Script<T> script;
}

With this in place, the time to compile the same script goes down from 75ms to 20ms which is a big improvement, but still considerably slower than what I'd love to see in CSharpScript.

asdfgasdfsafgsdfa commented 7 years ago

I tested that way as well and I agree. I don't know why it is that slow.

I only managed to avoid the slow compilation time by merging all my code into one big submission to the script instance. In my experience adding more code is essentially free (in time, the eval/compile/run process doesn't take longer), but setting up / starting a compilation always seems to take some initial time even when reusing.

If its only adding small pieces of code, I was hoping for <1ms, many other parsers and "interpreters" (that really compile c# down to LinqExpressions or DynamicMethods) don't exceed a millisecond when it comes to compiling short fragments.

But of course I realize that roslyn is MUCH more powerful, it supports everything, not just a small subset.

But I think there is still a lot of room for optimizations. Pretty sure it's possible that roslyn itself can be optimized so we get below 2 ms.

Maybe someone from the team could run a quick test and at least tell us what roslyn is doing/why that causes it to be so slow. There's also a ~3second startup time on the first use... that can't be normal, right?

jnm2 commented 7 years ago

There's also a ~3second startup time on the first use... that can't be normal, right?

Loading DLLs and jitting the codepaths?

asdfgasdfsafgsdfa commented 7 years ago

Yes that would play a part in contributing to the time most likely. But I doubt that's normal, I don't have a slow computer. 3 seconds for just loading the dll(s)? That definitely is excessive, something else must be going on for sure.

epsitec commented 7 years ago

@VSadov any idea of what might be going on in the sample code provided here and why Roslyn is unexpectedly slow?

tmat commented 7 years ago

@epsitec As I mentioned, based on your profile I think this is caused by #5517. The reference binding is slow and not being reused across scripts.

mattwar commented 7 years ago

This is what I get running a simple test in a VM with latest RC4 Rolyn script bits. (2.4ghz Xeon E5-2673)

using System;
using System.Diagnostics;
using Microsoft.CodeAnalysis.CSharp.Scripting;

namespace Scripto
{
    class Program
    {
        static int iterations = 100;

        static void Main(string[] args)
        {
            var code = "42";

            var test0_ms = Test(1, () => CSharpScript.RunAsync(code).Wait(), warm: false);
            Console.WriteLine($"{test0_ms} : Run first script.");

            var test1_ms = Test(iterations, () => CSharpScript.RunAsync(code).Wait(), warm: true);
            Console.WriteLine($"{test1_ms} : Run new script (warm).");

            var baseScript = CSharpScript.Create("");
            var test3_ms = Test(iterations, () => baseScript.ContinueWith(code).RunAsync().Wait(), warm: true);
            Console.WriteLine($"{test3_ms} : Run separate script continued new from shared script (warm).");

            var test2Script = CSharpScript.Create(code);
            var test2_ms = Test(iterations, () => test2Script.RunAsync().Wait(), warm: true);
            Console.WriteLine($"{test2_ms} : Run same script over and over again (warm).");
        }

        static long Test(int iterations, Action testAction, bool warm)
        {
            if (warm)
            {
                testAction();
            }

            var timer = new Stopwatch();
            timer.Start();

            for (int i = 0; i < iterations; i++)
            {
                testAction();
            }

            timer.Stop();

            return timer.ElapsedMilliseconds / iterations;
        }
    }
}

Running test outputs:

2048 : Run first script.
6 : Run new script (warm).
2 : Run separate script continued new from shared script (warm).
0 : Run same script over and over again (warm).

So first run is approx. 2 seconds. Subsequent compile & runs is 6 ms. If you base your separate runs of the same base script it gets faster because of shared metadata context. Rerunning same script over and over again is less than 1 ms. This is because the delegate to the generate code is cached and reused when you execute again.

asdfgasdfsafgsdfa commented 7 years ago

can you do the same for code = "return (Func<object, string>)(x => x.ToString() + \"123\")" ? just curious, you don't have to execute the func that gets returned

mattwar commented 7 years ago

@asdfgasdfsafgsdfa sure..

This is the modified test:

using System;
using System.Diagnostics;
using Microsoft.CodeAnalysis.Scripting;
using Microsoft.CodeAnalysis.CSharp.Scripting;

namespace Scripto
{
    class Program
    {
        static void Main(string[] args)
        {
            var iterations = 100;
            var code = @"return (Func<object, string>)(x => x.ToString() + ""123"");";
            var options = ScriptOptions.Default.AddImports("System");

            Test("Run first script", 1, () => CSharpScript.RunAsync(code, options).Wait(), warm: false);
            Test("Run new script", iterations, () => CSharpScript.RunAsync(code, options).Wait(), warm: true);

            var baseScript = CSharpScript.Create("", options);
            Test("Run separate script continued new from shared script", iterations, () => baseScript.ContinueWith(code).RunAsync().Wait(), warm: true);

            var sharedScript = CSharpScript.Create(code, options);
            Test("Run same script over and over again", iterations, () => sharedScript.RunAsync().Wait(), warm: true);
        }

        static void Test(string title, int iterations, Action testAction, bool warm)
        {
            if (warm)
            {
                testAction();
            }

            var timer = new Stopwatch();
            timer.Start();

            for (int i = 0; i < iterations; i++)
            {
                testAction();
            }

            timer.Stop();

            var time = timer.ElapsedMilliseconds / iterations;

            Console.WriteLine($@"{time} : {title}{(warm ? " (warm)" : "")}");
        }
    }
}

And here is the timing results:

2317 : Run first script
7 : Run new script (warm)
2 : Run separate script continued new from shared script (warm)
0 : Run same script over and over again (warm)
mattwar commented 7 years ago

Lesson to learn. If you are going to run a lot of small disjoint scripts, you should have them all extend a common empty script.

If you want to hide the start-up time of the first compile & run script then you should execute the empty script async on a background thread (maybe Task.Run), and then depending on your app it is likely this work is already done by the time you execute the first real script that continues on from it.

mattwar commented 7 years ago

Here is a simple command line REPL. With precomputing the first script asynchronously, you probably won't notice any delay by the time you issue your first request.

using System;
using Microsoft.CodeAnalysis.Scripting;
using Microsoft.CodeAnalysis.CSharp.Scripting;
using System.Threading.Tasks;

namespace Scripto
{
    class Program
    {
        static void Main(string[] args)
        {
            RunReplAsync().Wait();
        }

        static async Task RunReplAsync()
        {
            var options = ScriptOptions.Default.AddImports("System");
            var firstResult = Task.Run(() => CSharpScript.RunAsync("1 + 2", options));  // do simple nonsense work

            ScriptState lastResult = null;

            while (true)
            {
                Console.Write("> ");
                var code = Console.ReadLine();

                if (lastResult == null)
                {
                    lastResult = await firstResult.ConfigureAwait(false);
                }

                lastResult = await lastResult.ContinueWithAsync(code, options).ConfigureAwait(false);
                if (lastResult.ReturnValue != null)
                {
                    Console.WriteLine(lastResult.ReturnValue);
                }
            }
        }
    }
}
epsitec commented 7 years ago

@tmat can you clarify what you mean by based on your profile and why this relates to #5517 ?

Following code, all executed inside .NET 4.6.2 program, with no mixing of .NET Standard libraries, and just using ScriptOptions.Default is still slow:

public class LocalStatefulScriptCompiler<T>
{
    public void Setup(object globals)
    {
        var options = ScriptOptions.Default;
        this.script = CSharpScript.Create<T> ($"default ({typeof (T).FullName})", options: options, globalsType: globals.GetType ());
        var runner = this.script.CreateDelegate ();
    }

    public System.Func<T> Compile(string source, object globals)
    {
        this.script = this.script.ContinueWith<T> ($"return {source};");
        var runner = script.CreateDelegate ();
        return () => runner (globals).Result;
    }

    private Script<T> script;
}

Calling into it with:

var locals = new Locals ();
var compiler = new LocalStatefulScriptCompiler<decimal> ();
compiler.Setup (locals);

// Every call to compiler.Compile() takes 17ms
var func = compiler.Compile ("42m", locals);
tmat commented 7 years ago

@epsitec

These numbers suggest a lot of time spent in ReferenceManager: image

5517 describes an issue where we might not be reusing results of assembly binding between scripts that are chained. I'm not sure if that's still the case. The numbers @mattwar measured are not consistent with that hypothesis. In any case, it would be good to check and see how ReferenceManager is reused and close #5517 if it is actually reused properly.

Anteru commented 7 years ago

I'm also seeing 2 seconds startup time in Debug, and worse, 5 seconds startup time in release, which seems excessive. Subsequent compilation is fast, but anything that can be done to reduce the startup overhead? Should this be tracked in a separate issue? I'm fine with the compilation speed of individual scripts once it's warm, but initial startup time is a pain.

jnm2 commented 7 years ago

I wonder if this contributes to the ~5 second time till code runs in Cake? It has been getting on my nerves but I can't blame them for it if it's a Roslyn thing. It prevents you from being able to build sensible tab completion because who wants powershell to freeze for six seconds while compiling the script to obtain the tab completion options?

Anteru commented 7 years ago

Something is definitely up because I measure 2-6 seconds for the initial compile; but csi.exe starts in <1 sec for the first evaluation. Either the "system" Roslyn is nGen'd (the one in "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\Roslyn\csi.exe") and that explains it, or they're doing something smarter with regards to script options, or performance regressed since then.

Anteru commented 7 years ago

Tried to ngen but it doesn't seem to have any impact. At best, 0.5 seconds, so 4.3 -> 3.8 seconds in Release mode here. Any ideas what else to try?

Anteru commented 7 years ago

For fun, I compiled Roslyn's csi.exe from scratch using this Git commit: https://github.com/dotnet/roslyn/commit/57c813192bd494bb15aaa841f415f28f50438e19 (this is the commit used for the 2.3.1 NuGet package.)

That one also takes 3-4 seconds to evaluate the first expression, unlike my "system" csi.exe which is fast as explained above. So either this degraded between whatever was used to create the bundled csi.exe and 2.3.1, or something was done to that csi.exe to make it fast. (I also tried master as of a couple of minutes ago, and it looks like that's a tad slower than the official 2.3.1 release.

mattwar commented 7 years ago

Most of the first time cost is the JIT compiler processing all the assemblies and methods used by calling the API's.

The reason csi.exe runs faster is that .net assembliies and exes installed by VS are typically pre-jitted by ngen.exe at install time. It is likely that if you ngen the csi.exe you compiled from source you'll get a faster start up too.

mattwar commented 7 years ago

@anteru did you ngen all the assemblies?

Anteru commented 7 years ago

@mattwar I tried to ngen both my executable (which is basically a call to load-script + execute), as well as using ngen on the scripting assembly. Didn't help nearly as much as I expected.

mattwar commented 7 years ago

@anteru I think you'd have to ngen all of roslyn, etc, all assemblies brought in by the package.

jnm2 commented 7 years ago

@mattwar With ngen could I expect the startup time of cold compiling a Cake script to be sub-100 ms?

mattwar commented 7 years ago

@jnm2 I have no idea.

The csc.exe command line compiler has a process that starts up to amortize costs loading referenced assembly metadata to speed up msbuild. That can make c# compilation appear to be faster when its not really a cold start. csi.exe does not use this. Cake could be engineered to do something similar, but not for the first compile.

Anteru commented 7 years ago

@mattwar Just to be clear, I need to compile everything Roslyn pulls in, and my executable as well, or should it be fast without using ngen on my executable on top of that? For development, it's certainly nicer if I can just ngen Roslyn once on every package update.

mattwar commented 7 years ago

@anteru I don't think the roslyn nuget package ngens when installed. Maybe @jaredpar knows otherwise.

jnm2 commented 7 years ago

@mattwar Rather than keeping a background process running, what if Roslyn could cache all the lookup information on disk so that subsequent calls would be able to do minimal validation and use as much of the cache as possible?

mattwar commented 7 years ago

@jnm2 There is already a cache on disk, its the reference assembly.

jnm2 commented 7 years ago

@jnm2 There is already a cache on disk, its the reference assembly.

So the startup cost is in loading the reference assemblies into memory, not in processing that information into an intermediate data structure?

mattwar commented 7 years ago

@jnm2 there is processing on top of the loading, by construction roslyn symbols from the metadata. Since the roslyn symbols are allocated on the heap, there is no flat binary representation of them other than the metadata format itself, which already exists on disk.

molinch commented 6 years ago

It's not exactly the same, but goes into performance too. When running in x86 mode, for each EvaluateAsync call I get a 100MB peak of unmanaged memory consumed. It's not a leak: as soon as GC kicks in that memory is reclaimed.

However, let's say I have many scripts to execute immediately and one after the other. Then I easily get an OutOfMemoryException. This issue mainly happens when running in x86.

Is there a way to avoid these memory peaks? I can understand 1 peak, but why does it cost so much all the time?

Sample to reproduce the issue, it happens after a couple of iterations: https://gist.github.com/molinch/7eb0005141eda546fbfffbfe2292e4ce