Open nietras opened 1 day ago
Try with DATAS disabled e.g. <GarbageCollectionAdaptationMode>0</GarbageCollectionAdaptationMode>
cc: @mangod9, @Maoni0
Command I run from branch net9.0
dotnet run -c Release -f net8.0 --project src/Sep.ComparisonBenchmarks/Sep.ComparisonBenchmarks.csproj -- -m --warmupCount 6 --minIterationCount 5 --maxIterationCount 15 --runtimes net80 net90 --iterationTime 350 --hide Type Quotes Reader RatioSD Gen0 Gen1 Gen2 Error Median StdDev --filter *GcServerLongAsset*Sep*
No change with <GarbageCollectionAdaptationMode>0</GarbageCollectionAdaptationMode>
but can't remember if BDN actually forward this to sub-processes? Is there a flag to tell BDN to use this like Server=True?
Server=True InvocationCount=Default IterationTime=350ms
MaxIterationCount=15 MinIterationCount=5 WarmupCount=6
Quotes=False Reader=String
| Method | Runtime | Scope | Rows | Mean | Ratio | MB | MB/s | ns/row | Allocated | Alloc Ratio |
|---------- |--------- |------ |-------- |---------:|------:|----:|-------:|-------:|----------:|------------:|
| Sep______ | .NET 8.0 | Asset | 1000000 | 431.7 ms | 1.00 | 583 | 1352.1 | 431.7 | 260.41 MB | 1.00 |
| Sep_MT___ | .NET 8.0 | Asset | 1000000 | 111.1 ms | 0.26 | 583 | 5252.6 | 111.1 | 261.2 MB | 1.00 |
| Sep______ | .NET 9.0 | Asset | 1000000 | 500.7 ms | 1.16 | 583 | 1165.9 | 500.7 | 260.42 MB | 1.00 |
| Sep_MT___ | .NET 9.0 | Asset | 1000000 | 178.4 ms | 0.41 | 583 | 3272.0 | 178.4 | 261.32 MB | 1.00 |
Yes, it's DATAS. Tried settings it with environment variable e.g. for BDN with --envVars DOTNET_GCDynamicAdaptationMode:0
and tried running with 0 and 1 as can be seen below. This means "regression" is solely due to DATAS being default and otherwise no difference
NO DATAS
dotnet run -c Release -f net8.0 --project src/Sep.ComparisonBenchmarks/Sep.ComparisonBenchmarks.csproj -- -m --warmupCount 6 --minIterationCount 5 --maxIterationCount 15 --runtimes net80 net90 --iterationTime 350 --hide Type Quotes Reader RatioSD Gen0 Gen1 Gen2 Error Median StdDev --filter *GcServerLongAsset*Sep* --envVars DOTNET_GCDynamicAdaptationMode:0
BenchmarkDotNet v0.14.0, Windows 10 (10.0.19044.3086/21H2/November2021Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.100-rc.2.24474.11
[Host] : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
Job-KKDGWQ : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
Job-HUTQEJ : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX2
EnvironmentVariables=DOTNET_GCDynamicAdaptationMode=0 Server=True InvocationCount=Default
IterationTime=350ms MaxIterationCount=15 MinIterationCount=5
WarmupCount=6 Quotes=False Reader=String
| Method | Runtime | Scope | Rows | Mean | Ratio | MB | MB/s | ns/row | Allocated | Alloc Ratio |
|---------- |--------- |------ |-------- |---------:|------:|----:|-------:|-------:|----------:|------------:|
| Sep______ | .NET 8.0 | Asset | 1000000 | 452.7 ms | 1.00 | 583 | 1289.6 | 452.7 | 260.41 MB | 1.00 |
| Sep_MT___ | .NET 8.0 | Asset | 1000000 | 112.4 ms | 0.25 | 583 | 5195.4 | 112.4 | 261.51 MB | 1.00 |
| Sep______ | .NET 9.0 | Asset | 1000000 | 445.3 ms | 0.98 | 583 | 1310.9 | 445.3 | 260.41 MB | 1.00 |
| Sep_MT___ | .NET 9.0 | Asset | 1000000 | 117.8 ms | 0.26 | 583 | 4954.0 | 117.8 | 261.38 MB | 1.00 |
DATAS
dotnet run -c Release -f net8.0 --project src/Sep.ComparisonBenchmarks/Sep.ComparisonBenchmarks.csproj -- -m --warmupCount 6 --minIterationCount 5 --maxIterationCount 15 --runtimes net80 net90 --iterationTime 350 --hide Type Quotes Reader RatioSD Gen0 Gen1 Gen2 Error Median StdDev --filter *GcServerLongAsset*Sep* --envVars DOTNET_GCDynamicAdaptationMode:1
BenchmarkDotNet v0.14.0, Windows 10 (10.0.19044.3086/21H2/November2021Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.100-rc.2.24474.11
[Host] : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
Job-ZORNME : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
Job-BHTHZN : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX2
EnvironmentVariables=DOTNET_GCDynamicAdaptationMode=1 Server=True InvocationCount=Default
IterationTime=350ms MaxIterationCount=15 MinIterationCount=5
WarmupCount=6 Quotes=False Reader=String
| Method | Runtime | Scope | Rows | Mean | Ratio | MB | MB/s | ns/row | Allocated | Alloc Ratio |
|---------- |--------- |------ |-------- |---------:|------:|----:|-------:|-------:|----------:|------------:|
| Sep______ | .NET 8.0 | Asset | 1000000 | 527.5 ms | 1.00 | 583 | 1106.6 | 527.5 | 260.41 MB | 1.00 |
| Sep_MT___ | .NET 8.0 | Asset | 1000000 | 170.0 ms | 0.32 | 583 | 3433.5 | 170.0 | 261.41 MB | 1.00 |
| Sep______ | .NET 9.0 | Asset | 1000000 | 528.2 ms | 1.00 | 583 | 1105.2 | 528.2 | 260.41 MB | 1.00 |
| Sep_MT___ | .NET 9.0 | Asset | 1000000 | 182.9 ms | 0.35 | 583 | 3192.2 | 182.9 | 261.17 MB | 1.00 |
yeah a throughput regression for certain microbenchmark scenarios is expected with DATAS. Assume the benchmark shows improved working set utilization?
It is expected in .NET 9.
In general, DATAS should benefit real-world applications a lot as it can largely reduce the working set and also improve GC latency, though it comes with a minor throughput penalty.
In another similar issue (#101006) I did a binary-tree allocation benchmark and got the following benchmark result on .NET 9 rc2:
Considering the large improvements to latency and working set, I would take the minor throughput perf regression.
In https://github.com/nietras/Sep (a fast highly optimized CSV parser) I have been comparing performance
comparison-bench.ps1
between .NET 8 and .NET 9 RC2 and have observed what appears to be consistent and significant performance regression when usingServerGarbageCollection
(true
). The benchmark in question is also discussed in https://www.joelverhagen.com/blog/2020/12/fastest-net-csv-parsersBenchmarks can be run by cloning the Sep repo, checking out branch
net9.0
and running the command in thecomparison-bench.ps1
perhaps adding--filter *GcServer*Sep*
or similar. Details for benchmark, machine are given below via BenchmarkDotNet.As can be seen this shows regression in a scenario of many medium size object allocations ranging from 500ms/429ms = 1.17x (single thread) to 174ms/102ms = 1.69x (multi-threaded) regression.
I know there have been changes to the GC my question is whether this regression is expected? And just wanted to flag it if it has any interest.