Open alexcovington opened 3 years ago
Looks like complus_gccpugroup
expects a boolean, so using complus_gccpugroup: true
works but still requires complus_gcheapaffinitizeranges
to be set.
However, when I try running with complus_gcheapaffinitizeranges: 0:0-63,1:64-127
, the benchmark crashes with:
12:59:19 Running defgcperfsim__a__only_config__0gb__0
C:\Users\alex\source\repos\performance\src\benchmarks\gc\src\dependencies\PerfView.exe start -NoV2Rundown -NoNGENRundown -NoRundown -AcceptEULA -NoGUI -Merge:true -SessionName:CoreGCBench -zip:false -GCCollectOnly -LogFile:C:\Users\alex\source\repos\performance\src\benchmarks\gc\bench\suite-test\normal_server.yaml.out\defgcperfsim__a__only_config__0gb__0.perfview-log.txt -CircularMB:1024 -BufferSizeMB:1024 -DataFile:C:\Users\alex\source\repos\performance\src\benchmarks\gc\bench\suite-test\normal_server.yaml.out\defgcperfsim__a__only_config__0gb__0.etl
C:\Users\alex\source\repos\runtime-master\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root\CoreRun.exe C:\Users\alex\source\repos\performance\artifacts\bin\GCPerfSim\release\netcoreapp5.0\GCPerfSim.dll -tc 254 -tagb 300.0 -tlgb 0.0 -lohar 0 -pohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohsr 100-4000 -lohsr 102400-204800 -pohsr 100-204800 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time (cwd C:\Users\alex\source\repos\performance\src\benchmarks\gc\bench\suite-test\normal_server.yaml.out\defgcperfsim__a__only_config__0gb__0)
BEGIN: coreclr_initialize failed - Error: 0x8013200b
END: coreclr_initialize failed - Error: 0x8013200b
C:\Users\alex\source\repos\performance\src\benchmarks\gc\src\dependencies\PerfView.exe stop -NoV2Rundown -NoNGENRundown -NoRundown -AcceptEULA -NoGUI -Merge:true -SessionName:CoreGCBench -zip:false -GCCollectOnly -LogFile:C:\Users\alex\source\repos\performance\src\benchmarks\gc\bench\suite-test\normal_server.yaml.out\defgcperfsim__a__only_config__0gb__0.perfview-log.txt -CircularMB:1024 -BufferSizeMB:1024 -DataFile:C:\Users\alex\source\repos\performance\src\benchmarks\gc\bench\suite-test\normal_server.yaml.out\defgcperfsim__a__only_config__0gb__0.etl
12:59:25 Running defgcperfsim__a__only_config__0gb__1
C:\Users\alex\source\repos\performance\src\benchmarks\gc\src\dependencies\PerfView.exe start -NoV2Rundown -NoNGENRundown -NoRundown -AcceptEULA -NoGUI -Merge:true -SessionName:CoreGCBench -zip:false -GCCollectOnly -LogFile:C:\Users\alex\source\repos\performance\src\benchmarks\gc\bench\suite-test\normal_server.yaml.out\defgcperfsim__a__only_config__0gb__1.perfview-log.txt -CircularMB:1024 -BufferSizeMB:1024 -DataFile:C:\Users\alex\source\repos\performance\src\benchmarks\gc\bench\suite-test\normal_server.yaml.out\defgcperfsim__a__only_config__0gb__1.etl
C:\Users\alex\source\repos\runtime-master\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root\CoreRun.exe C:\Users\alex\source\repos\performance\artifacts\bin\GCPerfSim\release\netcoreapp5.0\GCPerfSim.dll -tc 254 -tagb 300.0 -tlgb 0.0 -lohar 0 -pohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohsr 100-4000 -lohsr 102400-204800 -pohsr 100-204800 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time (cwd C:\Users\alex\source\repos\performance\src\benchmarks\gc\bench\suite-test\normal_server.yaml.out\defgcperfsim__a__only_config__0gb__1)
BEGIN: coreclr_initialize failed - Error: 0x8013200b
END: coreclr_initialize failed - Error: 0x8013200b
C:\Users\alex\source\repos\performance\src\benchmarks\gc\src\dependencies\PerfView.exe stop -NoV2Rundown -NoNGENRundown -NoRundown -AcceptEULA -NoGUI -Merge:true -SessionName:CoreGCBench -zip:false -GCCollectOnly -LogFile:C:\Users\alex\source\repos\performance\src\benchmarks\gc\bench\suite-test\normal_server.yaml.out\defgcperfsim__a__only_config__0gb__1.perfview-log.txt -CircularMB:1024 -BufferSizeMB:1024 -DataFile:C:\Users\alex\source\repos\performance\src\benchmarks\gc\bench\suite-test\normal_server.yaml.out\defgcperfsim__a__only_config__0gb__1.etl
========= *WARNING*: Test 'C:\Users\alex\source\repos\performance\src\benchmarks\gc\bench\suite-test\normal_server.yaml' encountered errors. =========
*** Here is a summary of the problems found: ***
======= Executable 'defgcperfsim' =======
===== Core 'a' =====
=== Configuration 'only_config' ===
- Benchmark: '0gb' -
Iteration: 0
Error Message: Process failed with code 4294967295
Stack Trace:
File ".\src\exec\run_tests.py", line 182, in _run_all_benchmarks
t.out,
File ".\src\exec\run_single_test.py", line 110, in run_single_test
partial_test_status = _do_run_single_test(built, t, out)
File ".\src\exec\run_single_test.py", line 186, in _do_run_single_test
return _run_single_test_windows_perfview(built, t, out)
File ".\src\exec\run_single_test.py", line 417, in _run_single_test_windows_perfview
run_process, start_time_seconds=start_time_seconds, timeout_seconds=timeout_seconds
File ".\src\commonlib\util.py", line 481, in wait_on_process_with_timeout
assert killed or process.returncode == 0, f"Process failed with code {process.returncode}"
- Benchmark: '0gb' -
Iteration: 1
Error Message: Process failed with code 4294967295
Stack Trace:
File ".\src\exec\run_tests.py", line 182, in _run_all_benchmarks
t.out,
File ".\src\exec\run_single_test.py", line 110, in run_single_test
partial_test_status = _do_run_single_test(built, t, out)
File ".\src\exec\run_single_test.py", line 186, in _do_run_single_test
return _run_single_test_windows_perfview(built, t, out)
File ".\src\exec\run_single_test.py", line 417, in _run_single_test_windows_perfview
run_process, start_time_seconds=start_time_seconds, timeout_seconds=timeout_seconds
File ".\src\commonlib\util.py", line 481, in wait_on_process_with_timeout
assert killed or process.returncode == 0, f"Process failed with code {process.returncode}"
My understanding is COMPlus_GCHeapAffinitizeRanges
on Windows is in the format <cpu group number>:<logical cpu range>
. The machine I am running has 2 processors each with 64 cores/128 threads, totaling 128 cores/256 threads. I'm unsure why complus_gcheapaffinitizeranges: 0:0-63,1:64-127
fails since the logical CPU ranges are valid for the machine. Any help would be appreciated!
I think the processor numbers may be relative to the group, so perhaps complus_gcheapaffinitizeranges: 0:0-63,1:0-63
instead of complus_gcheapaffinitizeranges: 0:0-63,1:64-127
?
I think the processor numbers may be relative to the group, so perhaps
complus_gcheapaffinitizeranges: 0:0-63,1:0-63
instead ofcomplus_gcheapaffinitizeranges: 0:0-63,1:64-127
?
Yes, I think you're right. If I use complus_gcheapaffinitizeranges: 0:0-63,1:0-63
in just a small test C# program, I don't get any initialization error like above.
But if I use complus_gcheapaffinitizeranges: 0:0-63,1:0-63
in the GC benchmarks, bench_file.py
hits an assertion claiming the range is invalid:
Error in E:\acovingt\performance\src\benchmarks\gc\bench\suite-default\normal_server.gccpugroup-gcheapaffinitizeranges.net8.0.yaml
Traceback (most recent call last):
File "C:\Program Files\Python37\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Program Files\Python37\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File ".\__main__.py", line 9, in <module>
run_command(ALL_COMMANDS)
File ".\src\commonlib\command.py", line 87, in run_command
run_command_worker(commands, cmd_and_args)
File ".\src\commonlib\command.py", line 243, in run_command_worker
cast(Callable[[Any], None], command.fn)(args)
File ".\src\exec\run_tests.py", line 106, in run
run_test(args)
File ".\src\exec\run_tests.py", line 118, in run_test
bench = parse_bench_file(bench_file_path)
File ".\src\commonlib\bench_file.py", line 1215, in parse_bench_file
return BenchFileAndPath(load_yaml(BenchFile, path), path)
File ".\src\commonlib\parse_and_serialize.py", line 84, in load_yaml
cls, safe_load(f), path, PropPath.root(f"load_yaml {cls}"), all_optional
File ".\src\commonlib\parse_and_serialize.py", line 248, in _yaml_to_typed
return cast(T, unwrap(_try_yaml_to_typed(cls, o, yaml_file_path, desc, all_optional)))
File ".\src\commonlib\parse_and_serialize.py", line 237, in _try_yaml_to_typed
handle_dataclass=handle_dataclass,
File ".\src\commonlib\type_utils.py", line 235, in match_type
return default_handler(t) if handle_dataclass is None else handle_dataclass(t)
File ".\src\commonlib\parse_and_serialize.py", line 218, in handle_dataclass
all_non_err(_get_field(fld) for fld in flds),
File ".\src\commonlib\result_utils.py", line 14, in all_non_err
for x in xs:
File ".\src\commonlib\parse_and_serialize.py", line 218, in <genexpr>
all_non_err(_get_field(fld) for fld in flds),
File ".\src\commonlib\parse_and_serialize.py", line 205, in _get_field
fld.type, d[fld.name], yaml_file_path, child(fld.name)
File ".\src\commonlib\parse_and_serialize.py", line 237, in _try_yaml_to_typed
handle_dataclass=handle_dataclass,
File ".\src\commonlib\type_utils.py", line 216, in match_type
return default_handler(t) if handle_union is None else handle_union(union)
File ".\src\commonlib\parse_and_serialize.py", line 229, in <lambda>
members, lambda t: _try_yaml_to_typed(t, o, yaml_file_path, desc, all_optional)
File ".\src\commonlib\parse_and_serialize.py", line 37, in try_for_each_union_member
res = try_get(member)
File ".\src\commonlib\parse_and_serialize.py", line 229, in <lambda>
members, lambda t: _try_yaml_to_typed(t, o, yaml_file_path, desc, all_optional)
File ".\src\commonlib\parse_and_serialize.py", line 237, in _try_yaml_to_typed
handle_dataclass=handle_dataclass,
File ".\src\commonlib\type_utils.py", line 235, in match_type
return default_handler(t) if handle_dataclass is None else handle_dataclass(t)
File ".\src\commonlib\parse_and_serialize.py", line 219, in handle_dataclass
lambda field_values: construct_class_from_fields(cls, field_values),
File ".\src\commonlib\result_utils.py", line 47, in map_ok
return flat_map_ok(r, lambda t: Ok(cb(t)))
File ".\src\commonlib\result_utils.py", line 62, in flat_map_ok
return match(r, cb, Err)
File ".\src\commonlib\result_utils.py", line 69, in match
return cb_ok(r.unwrap())
File ".\src\commonlib\result_utils.py", line 47, in <lambda>
return flat_map_ok(r, lambda t: Ok(cb(t)))
File ".\src\commonlib\parse_and_serialize.py", line 219, in <lambda>
lambda field_values: construct_class_from_fields(cls, field_values),
File ".\src\commonlib\type_utils.py", line 198, in construct_class_from_fields
return check_cast(cls, cast(Any, cls)(*field_values))
File "<string>", line 28, in __init__
File ".\src\commonlib\bench_file.py", line 217, in __post_init__
_parse_heap_affinitize_ranges(self.complus_gcheapaffinitizeranges)
File ".\src\commonlib\bench_file.py", line 280, in _parse_heap_affinitize_ranges
_assert_sorted_and_non_overlapping(ranges)
File ".\src\commonlib\bench_file.py", line 287, in _assert_sorted_and_non_overlapping
assert r.lo > prev
AssertionError
So it looks like bench_file.py
might be assuming the range 0:0-63,1:0-63
is incorrect, even though it's valid.
A little background: This came up again recently in https://github.com/dotnet/runtime/issues/90715, and my searches found this item. Copying a reply from other there:
As a short-term fix, you could just delete that assertion as the python script is really just passing the value through. Or, the branch for https://github.com/dotnet/performance/pull/3275 contains an untested fix.
However, we have replaced the Python infrastructure with C# infrastructure for running these scenarios, so we won't be merging that fix and plan on deleting the Python code. @mrsharm will provide more information.
Hi @alexcovington,
We have released a new version of the Infrastructure in C# that we are encouraging users to try. Here are the additional steps you'll need to take to get this to work (now that you have a GCPerfSim prebuilt) is:
The corresponding yaml file based on your example above is as follows - please copy the contents and save it in a .yaml file to be used. As a heads up, you might be missing the COMPlus_Thread_UseAllCPUGroups config that needs to be set to get > 64 heaps:
runs:
0gb:
override_parameters:
tlgb: 2
sohsi: 50
sohpi: 50
environment_variables:
complus_gcserver: true
complus_gcconcurrent: false
complus_gcheapcount: 254
complus_gccpugroup: 1
COMPlus_Thread_UseAllCPUGroups: 1
gcperfsim_configurations:
parameters:
tc: 254
tagb: 300
tlgb: 0
lohar: 0
pohar: 0
sohsr: 100-4000
lohsr: 102400-204800
pohsr: 100-204800
sohsi: 0
lohsi: 0
pohsi: 0
sohpi: 0
lohpi: 0
sohfi: 0
lohfi: 0
pohfi: 0
allocType: reference
testKind: time
gcperfsim_path: C:\Users\alex\source\repos\performance\artifacts\bin\GCPerfSim\release\netcoreapp5.0\GCPerfSim.dll
environment:
environment_variables: {}
default_max_seconds: 300
iterations: 2
coreruns:
a:
path: C:\Users\alex\source\repos\runtime-master\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root
environment_variables: {}
linux_coreruns:
output:
path: C:\Users\musharm\source\repos\CustomerRepro
columns:
- Count
- total allocated (mb)
- total pause time (msec)
- PctTimePausedInGC
- FirstToLastGCSeconds
- HeapSizeAfter_Mean
- HeapSizeBeforeMB_Mean
- PauseDurationMSec_95PWhereIsGen0
- PauseDurationMSec_95PWhereIsGen1
- PauseDurationMSec_95PWhereIsBackground
- PauseDurationMSec_95PWhereIsBlockingGen2
- CountIsBlockingGen2
- HeapCount
- TotalNumberGCs
- TotalAllocatedMB
- Speed
- PauseDurationMSec_MeanWhereIsEphemeral
- PauseDurationMSec_MeanWhereIsBackground
- PauseDurationMSec_MeanWhereIsBlockingGen2
- PauseDurationSeconds_SumWhereIsGen1
- PauseDurationSeconds_Sum
- CountIsGen1
- ExecutionTimeMSec
percentage_disk_remaining_to_stop_per_run: 0
all_columns:
- Count
- total allocated (mb)
- total pause time (msec)
- PctTimePausedInGC
- FirstToLastGCSeconds
- HeapSizeAfter_Mean
- HeapSizeBeforeMB_Mean
- PauseDurationMSec_95PWhereIsGen0
- PauseDurationMSec_95PWhereIsGen1
- PauseDurationMSec_95PWhereIsBackground
- PauseDurationMSec_95PWhereIsBlockingGen2
- CountIsBlockingGen2
- HeapCount
- TotalNumberGCs
- TotalAllocatedMB
- Speed
- PauseDurationMSec_MeanWhereIsEphemeral
- PauseDurationMSec_MeanWhereIsBackground
- PauseDurationMSec_MeanWhereIsBlockingGen2
- PauseDurationSeconds_SumWhereIsGen1
- PauseDurationSeconds_Sum
- CountIsGen1
- ExecutionTimeMSec
- Count
- PctTimePausedInGC
- FirstToLastGCSeconds
- HeapSizeAfter_Mean
- HeapSizeBeforeMB_Mean
- PauseDurationMSec_95PWhereIsGen0
- PauseDurationMSec_95PWhereIsGen1
- PauseDurationMSec_95PWhereIsBackground
- PauseDurationMSec_95PWhereIsBlockingGen2
- CountIsBlockingGen2
- HeapCount
- TotalNumberGCs
- TotalAllocatedMB
- Speed
- PauseDurationMSec_MeanWhereIsEphemeral
- PauseDurationSeconds_SumWhereIsGen1
- PauseDurationSeconds_Sum
- CountIsGen1
- ExecutionTimeMSec
formats:
- markdown
- json
name: Normal_Server
trace_configurations:
type: gc
Invoke the infrastructure to run GCPerfSim: \artifacts\bin\GC.Infrastructure\Release\net7.0\GC.Infrastructure.exe gcperfsim --configuration PathToYamlFile.yaml. Feel free to adjust any of the parameters - these should be 1 to 1 with the old infrastructure.
Full documentation to build this can be found here.
Do let us know if you run into any issues.
I'm trying to run the GC benchmarks and enable CPU Groups since the machine I'm benchmarking has >64 processors (reference). When I try to add the
COMPlus_GCCpuGroup
environment variable to the run file, the Python script throws an error.Run file:
Steps to reproduce:
The benchmarks run fine if I remove
complus_gccpugroup: 1
from the run file, but I want to run the benchmarks with CPU groups enabled.If I try and set the environment variable before running the Python script, I get the following error:
Extra info for troubleshooting: