Open Dal78 opened 2 years ago
I can see this being a good idea if the kernel doesn't do its job properly (or it's decided user space programs should figure things out themselves). That's kind of the same issue that triggered me to make this thing in the first place!
Looking at the P-State governor Phoronix benchmark on an i9-12900K, using the performance
governor and a recent enough kernel should do the trick already. There may be slight gains obtainable with manual core parking, but it'll likely require per-game tweaking to be effective.
Doesn't work for me on 13600k, this is what I get:
lut 27 19:44:07 stacjonarny gamemoded[68856]: cpu L3 cache was uniform, this is not a x3D with multiple chiplets
lut 27 19:44:07 stacjonarny gamemoded[68856]: cpu frequency was uniform, this is not a big.LITTLE type of system
lut 27 19:44:07 stacjonarny gamemoded[68856]: I can find no reason to perform core pinning on this system!
~ >>> lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 20
On-line CPU(s) list: 0-19
Vendor ID: GenuineIntel
Model name: 13th Gen Intel(R) Core(TM) i5-13600K
CPU family: 6
Model: 183
Thread(s) per core: 2
Core(s) per socket: 14
Socket(s): 1
Stepping: 1
CPU(s) scaling MHz: 59%
CPU max MHz: 5100,0000
CPU min MHz: 800,0000
BogoMIPS: 6991,00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl
vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid r
dseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear seriali
ze pconfig arch_lbr ibt flush_l1d arch_capabilities
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 544 KiB (14 instances)
L1i: 704 KiB (14 instances)
L2: 20 MiB (8 instances)
L3: 24 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-19
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Srbds: Not affected
Tsx async abort: Not affected
~ >>> lscpu --all --extended
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ MHZ
0 0 0 0 0:0:0:0 tak 5100,0000 800,0000 1086,9570
1 0 0 0 0:0:0:0 tak 5100,0000 800,0000 1097,8669
2 0 0 1 4:4:1:0 tak 5100,0000 800,0000 947,1510
3 0 0 1 4:4:1:0 tak 5100,0000 800,0000 800,0000
4 0 0 2 8:8:2:0 tak 5100,0000 800,0000 800,0000
5 0 0 2 8:8:2:0 tak 5100,0000 800,0000 800,0000
6 0 0 3 12:12:3:0 tak 5100,0000 800,0000 940,7350
7 0 0 3 12:12:3:0 tak 5100,0000 800,0000 800,1910
8 0 0 4 16:16:4:0 tak 5100,0000 800,0000 1100,0031
9 0 0 4 16:16:4:0 tak 5100,0000 800,0000 1099,2960
10 0 0 5 20:20:5:0 tak 5100,0000 800,0000 800,0000
11 0 0 5 20:20:5:0 tak 5100,0000 800,0000 800,6120
12 0 0 6 24:24:6:0 tak 3900,0000 800,0000 799,2260
13 0 0 7 25:25:6:0 tak 3900,0000 800,0000 799,9380
14 0 0 8 26:26:6:0 tak 3900,0000 800,0000 800,1120
15 0 0 9 27:27:6:0 tak 3900,0000 800,0000 800,0000
16 0 0 10 28:28:7:0 tak 3900,0000 800,0000 800,0000
17 0 0 11 29:29:7:0 tak 3900,0000 800,0000 799,9980
18 0 0 12 30:30:7:0 tak 3900,0000 800,0000 799,8760
19 0 0 13 31:31:7:0 tak 3900,0000 800,0000 800,0000
~ >>> cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq
5100000
5100000
5100000
5100000
5100000
5100000
5100000
5100000
5100000
5100000
5100000
5100000
3900000
3900000
3900000
3900000
3900000
3900000
3900000
3900000
~ >>> gamemoded --version
gamemode version: v1.8.1
OS is current manjaro. Let me know what to add here, I'm not sure what is happening.
Sorry for the edit spam, in my case manual settings of pin_cores works fine, I've set it as:
[cpu]
pin_cores=0-11
Is there any chance for some Big / Little Alder lake style core steering?
GameMode processes can get the P cores along with some designated kernel elements which are key to performance and the rest can be shunted to E cores.
This may help to optimise the per core boost making it more likely we see higher clocks in lightly threaded games.
Currently nearly every game is defaulting to all core.
Thoughts?