Implementation of WaitUntil.CoroutineEnds

SirePi commented 4 years ago

This is an implementation of WaitUntil.CoroutineEnds as described here.

Barsonax commented 4 years ago

Scanned through the changes a bit. Looks pretty good already but I think we can generalize it a bit more so we can use this for more than just a coroutineends feature.

Also I want to measure the performance of this again.

Barsonax commented 4 years ago

As expected some allocations are happening. Seems to be about 144 bytes per call which does impact performance quite a bit.

// * Summary *

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
AMD Ryzen 9 3900X, 1 CPU, 24 logical and 12 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4180.0), X64 RyuJIT
  DefaultJob : .NET Framework 4.8 (4.8.4180.0), X64 RyuJIT

|                Method |      N |             Mean |          Error |         StdDev |     Gen 0 |     Gen 1 |    Gen 2 |  Allocated |
|---------------------- |------- |-----------------:|---------------:|---------------:|----------:|----------:|---------:|-----------:|
| CallEndlessSubRoutine |      1 |         72.61 ns |       0.835 ns |       0.781 ns |    0.0869 |         - |        - |      144 B |
| CallEndlessSubRoutine |     10 |        672.92 ns |       3.946 ns |       3.498 ns |    0.8698 |         - |        - |     1444 B |
| CallEndlessSubRoutine |    100 |      6,785.61 ns |      32.611 ns |      27.232 ns |    8.6975 |         - |        - |    14443 B |
| CallEndlessSubRoutine |   1000 |     78,150.10 ns |     691.155 ns |     646.507 ns |   56.3965 |   14.0381 |        - |   144427 B |
| CallEndlessSubRoutine |  10000 |  1,027,735.81 ns |  11,053.970 ns |  10,339.890 ns |  230.4688 |  115.2344 |        - |  1444252 B |
| CallEndlessSubRoutine | 100000 | 29,019,037.50 ns | 248,125.998 ns | 232,097.212 ns | 2625.0000 | 1093.7500 | 343.7500 | 14442578 B |

Its important to note that this only happens when you use the delegate. There is also a performance hit without because the struct is now bigger but its much smaller.

Barsonax commented 4 years ago

Not yet satisfied with the performance as there is alot of allocations going on (closures, boxing) but I don't think there's a way around this unless C# adds something like a yield all or yield foreach keyword.

In that regard writing this does exactly the same:

foreach(var step in SubCoroutine()) {
  yield return step;
}

This should prevent most of the allocations (there is still a enumerator being allocated) and makes the WaitUntil struct smaller.

It is however more verbose and less readable.

To give some numbers to show how big this difference is:

// * Summary *

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
AMD Ryzen 9 3900X, 1 CPU, 24 logical and 12 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4180.0), X64 RyuJIT
  DefaultJob : .NET Framework 4.8 (4.8.4180.0), X64 RyuJIT

|                              Method |   N |        Mean |     Error |    StdDev |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------------------ |---- |------------:|----------:|----------:|-------:|------:|------:|----------:|
| CallEndlessSubRoutine_CoroutineEnds |   1 |    70.04 ns |  0.698 ns |  0.653 ns | 0.0869 |     - |     - |     144 B |
|       CallEndlessSubRoutine_Foreach |   1 |    51.59 ns |  0.418 ns |  0.391 ns | 0.0241 |     - |     - |      40 B |
| CallEndlessSubRoutine_CoroutineEnds | 100 | 6,829.83 ns | 73.671 ns | 68.912 ns | 8.6975 |     - |     - |   14443 B |
|       CallEndlessSubRoutine_Foreach | 100 | 4,788.00 ns | 35.413 ns | 33.125 ns | 2.4109 |     - |     - |    4012 B |

Thats already a big difference, especially in allocations.

But thats not all because in both benchmarks the WaitUntil struct is bigger due to the added field for the delegate. If we remove it the numbers change into this:

// * Summary *

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
AMD Ryzen 9 3900X, 1 CPU, 24 logical and 12 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4180.0), X64 RyuJIT
  DefaultJob : .NET Framework 4.8 (4.8.4180.0), X64 RyuJIT

|                        Method |   N |        Mean |     Error |    StdDev |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------------ |---- |------------:|----------:|----------:|-------:|------:|------:|----------:|
| CallEndlessSubRoutine_Foreach |   1 |    43.01 ns |  0.702 ns |  0.657 ns | 0.0193 |     - |     - |      32 B |
| CallEndlessSubRoutine_Foreach | 100 | 3,846.48 ns | 38.659 ns | 34.270 ns | 1.9302 |     - |     - |    3210 B |

So this is gonna be a question what we will value more. The readability of CoroutineEnds or the performance of just foreaching over the IEnumerable.

With deeper nesting the differences will only get bigger so really doubting if we should add this utility method.

SirePi commented 4 years ago

If this is how it is, then I believe it's better to just remove the method altogether.. there are many ways to deal with it in case. You can take the foreach route, or you can go with the full coroutine-declaring approach that even lets you pause or resume it.

I'll just close it and add a note in the docs that this there is the possibility to integrate a Coroutine inside another

AdamsLair / duality

Implementation of WaitUntil.CoroutineEnds #821