dadhi / DryIoc

DryIoc is fast, small, full-featured IoC Container for .NET
MIT License
1.03k stars 123 forks source link

DryIoc cold start performance #27

Closed Fruchuxs closed 5 years ago

Fruchuxs commented 6 years ago

After one week of investigations we found out, that DryIoc 3.0.2 is rather slow in comparison to 2.12.8 at the first execution time (cold start). In most cases the first call of a C# application doesn't count. But we measured a time increase of ca. six seconds for greater object graphs in our application. So for example our asp.net core 2.1 application starts, DryIoc resolves some small Services, than the applications tries to resolve a controller and .. we can make a cup of coffee. For a web applications this is just uncomfortable (because after the JIT compiling it's fast), but for one of our console applications this is really painful.

We looked at the expression trees which dryioc generates in version 3.0.2 and 2.12.8 for one of our object graphs. Too much to post here and to analyze in depth, but we can say, that the dryioc 3.0.2 expression tree is roundabout 90% bigger and seems even more complex.

We didn't changed the service registrations much. We just replaced Reuse.InWebrequest through Reuse.Scoped and two RegisterDelegates through RegisterMany as you mentioned out here.

Maybe you can say more about this? In most of the cases we use simple service registrations like Register<IA, A>, or factories, or made: Parameters.Of.Type ... We also use the IEnumerable-resolve Feature.

ahydrax commented 6 years ago

Tried with 2 and 5. Nothing changed. Could that be a problem with ExpressionToCode itself?

dadhi commented 6 years ago

Maybe... try to do just Expression.ToString()

ahydrax commented 6 years ago

Tried with ToString(). Got 21mb file for 2, 5, 8, Without.

dadhi commented 6 years ago

Ohh sorry, a container.GenerateResolutionExpressions(...) maybe not the best tool, cause it tries to remove the run-time optimizations for generational scenarios. Instead, try to use:

var expr = container.Resolve<LambdaExpression>(typeof(MyControllerWithLotOfDependencies));

You need to open a scope if MyControllerWithLotOfDependencies registered with scope reuse:

using (var scope = container.OpenScope(Reuse.WebRequestScopeName))
    var expr = scope.Resolve<LambdaExpression>(typeof(MyControllerWithLotOfDependencies));
ahydrax commented 6 years ago

@dadhi Finally got different results: without - 14mb 8 - 3.7mb 5 - 937kb 2 - 6kb

Does that mean that I should use 2? How it's connected with other aspects: lifetime control/peformance/etc?

dadhi commented 6 years ago

Thanks for results. The value is greatly depends on context. Like how many and what are types of reuse, do you have a lazy/func deps, how deep or wide a graph. Try to peek some sensible value between 5 and 8 for instance. Previous DryIoc default was 8 (v3.0). I did other changes to decrease a result expression size (see the whole thread). I am planning a new changes/options as well (#40).

Maybe I will return back the default 8. So your and @Fruchuxs inputs are very valuable here.

dadhi commented 6 years ago

I wanted to add on resolving as LambdaExpression. You may experiment further by compiling the expression and (optionally) caching it in whatever suitable data structure. This way you are excluding all the container "magic" and use the IoC container as an automated object creation tool.

var expr = scope.Resolve<LambdaExpression>(typeof(MyControllerWithLotOfDependencies));

var factory = (FactoryDelegate)expr.Compile(); 
var controller = (MyControllerWithLotOfDependencies)factory(container);

Replace container with scope for scoped controller.

Btw, this also excludes FEC from the equatation, see #39.

dadhi commented 6 years ago

Just in case, I would also love to see a speed/memory comparison on your real graph between expr.Compile() and expr.CompileFast().

ahydrax commented 6 years ago

Hi @dadhi,

As you requested, here is benchmark of expr.Compile() and expr.CompileFast(). The benchmark code is here.

Benchmark parameters:

Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3190.0
  Job-MZJNWR : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3190.0

InvocationCount=10001  IterationCount=2  LaunchCount=1
RunStrategy=Monitoring  UnrollFactor=1  WarmupCount=1

With .WithDependencyDepthToSplitObjectGraph(2)

        Method |      Mean | Error |    StdDev | Ratio | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
-------------- |----------:|------:|----------:|------:|------------:|------------:|------------:|--------------------:|
 CompileNormal | 634.05 us |    NA | 1.1571 us |  1.00 |      6.5993 |      2.1998 |           - |            27.07 KB |
   CompileFast |  82.54 us |    NA | 0.0558 us |  0.13 |      6.6993 |      3.2997 |      0.2000 |            27.47 KB |

With .WithDependencyDepthToSplitObjectGraph(3)

        Method |     Mean | Error |   StdDev | Ratio | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
-------------- |---------:|------:|---------:|------:|------------:|------------:|------------:|--------------------:|
 CompileNormal | 985.9 us |    NA | 8.852 us |  1.00 |     30.3970 |           - |           - |           124.76 KB |
   CompileFast | 714.4 us |    NA | 7.205 us |  0.72 |     90.5909 |     46.6953 |      5.7994 |           359.43 KB |

With .WithDependencyDepthToSplitObjectGraph(5)

        Method |      Mean | Error |    StdDev | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
-------------- |----------:|------:|----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
 CompileNormal |  4.312 ms |    NA | 0.0017 ms |  1.00 |    0.00 |    172.8272 |     85.9141 |           - |             1.04 MB |
   CompileFast | 17.192 ms |    NA | 0.0625 ms |  3.99 |    0.02 |   2928.0719 |    161.8382 |     52.9471 |             12.3 MB |

With .WithDependencyDepthToSplitObjectGraph(7)

        Method |     Mean | Error |    StdDev | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
-------------- |---------:|------:|----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
 CompileNormal | 11.17 ms |    NA | 0.1530 ms |  1.00 |    0.00 |    504.9900 |    221.5569 |     75.8483 |             2.52 MB |
   CompileFast | 87.91 ms |    NA | 0.3260 ms |  7.87 |    0.08 |  16988.0240 |   1221.5569 |    189.6208 |            71.72 MB |

Seems that CompileFast grows exponentially depending on depth of graph.

dadhi commented 6 years ago

Thanks for the tests, very interesting. I would try the latest FEC v2 (after it is released) and will decide what to do.

Btw, seems that actual problems in your case was caused by FEC. Hopefully, will fix this soon(ish).

dzmitry-lahoda commented 6 years ago

May be related to https://github.com/dadhi/FastExpressionCompiler/issues/89#issuecomment-414571680.

Once I proposed to get real graphs into benchmark, created issue for voting, but it was closed without consideration https://github.com/danielpalme/IocPerformance/issues/86.

dzmitry-lahoda commented 6 years ago

Sot it next will not work for some reasons?

var expr = scope.Resolve<FastExpressionCompiler.LightExpression.LambdaExpression>(typeof(MyControllerWithLotOfDependencies)
dadhi commented 6 years ago

FEC.LightExpression is not supported cause FEC 2.0 is not yet integrared into DryIoc. Originally, the ExpressionInfo was not supported as a wrapper, as I did not think it was useful for the end client. After integration with FEC 2.0 maybe I will reconsider.. maybe

dadhi commented 6 years ago

@ahydrax,

I have released v3.1-preview-06 which sets a default depth to 5 for now, and moreover, simplifies the expressions with root scoped service (controller in your case). The result expression should have one less nested lambda (minus one compile step).

ahydrax commented 6 years ago

Hi @dadhi, here is benchmark results for preview-06.

BenchmarkDotNet=v0.11.2, OS=Windows 10.0.17763.55 (1809/October2018Update/Redstone5)
Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3190.0
  Job-GOPLAQ : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3190.0

InvocationCount=1001  IterationCount=2  LaunchCount=1
RunStrategy=Monitoring  UnrollFactor=1  WarmupCount=1

        Method |      Mean | Error |    StdDev | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
-------------- |----------:|------:|----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
 CompileNormal |  4.138 ms |    NA | 0.0266 ms |  1.00 |    0.00 |    172.8272 |     85.9141 |           - |             1.04 MB |
   CompileFast | 16.984 ms |    NA | 0.0015 ms |  4.10 |    0.03 |   2578.4216 |    162.8372 |     52.9471 |            12.29 MB |

My observations (container created with Rules.Default):

Fruchuxs commented 6 years ago

Yes, using .WithoutDependencyDepthToSplitObjectGraph() on 3.1.0-preview gives bigger output than stable version.

As I mentioed before, but our graph is not as big as @ahydrax Graph.

uses 14 dependencies

I don't want to critisize your code nor your approaches, but with 14 dependencies there are maybe other problems here. If your actions uses all of these dependencies (high cohesion) it's okay, but if not, you shall split your controller into different ones. The Dependency injection pattern respectively the inversion of control principle assumes, that your classes fulfill the single responsibility principle.

This doesn't mean, that DryIoc has problems here, but an IoC system has some constraints.

ahydrax commented 6 years ago

I don't want to critisize your code nor your approaches, but with 14 dependencies there are maybe other problems here. If your actions uses all of these dependencies (high cohesion) it's okay, but if not, you shall split your controller into different ones. The Dependency injection pattern respectively the inversion of control principle assumes, that your classes fulfill the single responsibility principle.

I agree with you: having 14 dependencies might be an indicator that something wrong with the code itself. Anyway, I didn't expect that changing DI container will break overall performance and/or its stability, especially that DryIoC is positioned as more performant container.

dadhi commented 6 years ago

Performance of containers depends on the use-case. Here, we likely hit the bug + specific case situation. But I glad for feetback, cause it pushed the boundaries for me.

dadhi commented 5 years ago

@ahydrax Hi, I have released a new preview v4.0.0-preview-01 with all the improvements from #45 Could you try it out in your big-bang setup?

ahydrax commented 5 years ago

Hi @dadhi , sure, approximately on Monday I'll post the results.

CoskunSunali commented 5 years ago

@ahydrax Do you have any updates to share with us?

dadhi commented 5 years ago

Here is the latest results: https://github.com/dadhi/DryIoc/issues/26#issuecomment-466460255

I will consider this issue closed, please open a new one if you see the problem.