cyring / CoreFreq

CoreFreq : CPU monitoring and tuning software designed for 64-bit processors.
https://www.cyring.fr
GNU General Public License v2.0
1.97k stars 126 forks source link

Crash after calling corefreq-cli #276

Closed ich777 closed 2 years ago

ich777 commented 3 years ago

Hi, got a report about a crash after issuing the command corefreq-cli. The system is a Intel Xeon E5-2630L running on a Supermicro - X9DRL-3F/iF maybe you can tell what caused this. I link the forum entry with more information here too: Click

Let me know if you need more information!

Oh, I found this in the syslog after it installs the CoreFreq plugin on unRAID:

Sep 13 15:34:03 ts200v root: ----------------------CoreFreq v1.86.7 found locally!-----------------------
Sep 13 15:34:03 ts200v root: 
Sep 13 15:34:03 ts200v root: ------------------Installing CoreFreq v1.86.7, please wait...!----------------
Sep 13 15:34:07 ts200v kernel: corefreqk: loading out-of-tree module taints kernel.
Sep 13 15:34:07 ts200v kernel: CoreFreq(4:16): Processor [ 06_2D] Architecture [SandyBridge/EP/Romley] SMT [24/24]
Sep 13 15:34:07 ts200v kernel: general protection fault, maybe for address 0x400000: 0000 [#1] SMP PTI
Sep 13 15:34:07 ts200v kernel: CPU: 4 PID: 3818 Comm: modprobe Tainted: G           O      5.13.8-Unraid #1
Sep 13 15:34:07 ts200v kernel: Hardware name: Supermicro X9DRL-3F/iF/X9DRL-3F/iF, BIOS 3.3 07/12/2018
Sep 13 15:34:07 ts200v kernel: RIP: 0010:Start_Uncore_SandyBridge_EP+0x3a/0x5f [corefreqk]
Sep 13 15:34:07 ts200v kernel: Code: 45 03 00 48 c1 e2 20 89 c0 48 09 c2 48 8b 06 48 89 90 60 01 00 00 48 89 d0 48 c1 ea 20 48 0d 00 00 40 00 0f 30 b9 91 03 00 00 <0f> 32 48 c1 e2 20 89 c0 48 09 c2 48 8b 06 48 89 90 58 01 00 00 48
Sep 13 15:34:07 ts200v kernel: RSP: 0018:ffffc90009807be8 EFLAGS: 00010006
Sep 13 15:34:07 ts200v kernel: RAX: 0000000000400000 RBX: 0000000000000004 RCX: 0000000000000391
Sep 13 15:34:07 ts200v kernel: RDX: 0000000000000000 RSI: ffff888107775800 RDI: 0000000000000000
Sep 13 15:34:07 ts200v kernel: RBP: ffff88810accc000 R08: ffffff00ffffffff R09: 000000000000004e
Sep 13 15:34:07 ts200v kernel: R10: 0000015270f8e618 R11: 00000011d62d2c9d R12: 000000000000030c
Sep 13 15:34:07 ts200v kernel: R13: 0000000008c3c9c4 R14: 000000133306b24c R15: 0000000000000000
Sep 13 15:34:07 ts200v kernel: FS:  000015152ef12740(0000) GS:ffff88885fb00000(0000) knlGS:0000000000000000
Sep 13 15:34:07 ts200v kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 13 15:34:07 ts200v kernel: CR2: 000014abdf6060a0 CR3: 0000000107704003 CR4: 00000000000606e0
Sep 13 15:34:07 ts200v kernel: Call Trace:
Sep 13 15:34:07 ts200v kernel: Entry_Intel_Xeon_EP.constprop.0+0x165/0x242 [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? ClockMod_HWP_PerCore+0xdc/0xdc [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? Entry_Intel_Xeon_EP.constprop.0+0x242/0x242 [corefreqk]
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Sep 13 15:34:07 ts200v kernel: generic_exec_single+0x3f/0x9c
Sep 13 15:34:07 ts200v kernel: smp_call_function_single+0xc2/0xf7
Sep 13 15:34:07 ts200v kernel: Controller_Start+0xc3/0xe6 [corefreqk]
Sep 13 15:34:07 ts200v kernel: CoreFreqK_Ignition_Level_Up+0x401/0x493 [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? 0xffffffffa0172000
Sep 13 15:34:07 ts200v kernel: CoreFreqK_StartUp+0x51/0xdc [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? CoreFreqK_Alloc_Processor_RO_Level_Up+0x4f/0x4f [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? Query_Features+0x5c5/0x5c5 [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? CoreFreqK_Make_Device_Level_Up+0x3a/0x3a [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? CoreFreqK_Create_Device_Level_Up+0x59/0x59 [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? CoreFreqK_Register_NMI+0x1c2/0x1c2 [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? CoreFreqK_Alloc_Private_Level_Up+0x42/0x42 [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? CoreFreqK_Alloc_Features_Level_Up+0x6f/0x6f [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? CoreFreqK_Alloc_Processor_RW_Level_Up+0x51/0x51 [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? CoreFreqK_Alloc_Public_Level_Down+0x17/0x17 [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? Compute_Interval+0x98/0x98 [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? CoreFreqK_Alloc_Private_Cache_Level_Up+0x37/0x37 [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? CoreFreqK_ProbePCI+0x93/0x93 [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? Define_CPUID+0x29/0x29 [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? SMBIOS_Collect+0x1c7/0x1c7 [corefreqk]
Sep 13 15:34:07 ts200v kernel: ? Controller_Stop+0xe6/0xe6 [corefreqk]
Sep 13 15:34:07 ts200v kernel: CoreFreqK_Init+0xb/0x1000 [corefreqk]
Sep 13 15:34:07 ts200v kernel: do_one_initcall+0x7b/0x17e
Sep 13 15:34:07 ts200v kernel: ? do_init_module+0x23/0x218
Sep 13 15:34:07 ts200v kernel: ? kmem_cache_alloc_trace+0x120/0x147
Sep 13 15:34:07 ts200v kernel: do_init_module+0x5b/0x218
Sep 13 15:34:07 ts200v kernel: __do_sys_init_module+0xc4/0x105
Sep 13 15:34:07 ts200v kernel: do_syscall_64+0x63/0x76
Sep 13 15:34:07 ts200v kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Sep 13 15:34:07 ts200v kernel: RIP: 0033:0x15152f04980a
Sep 13 15:34:07 ts200v kernel: Code: 48 8b 0d 61 76 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2e 76 0c 00 f7 d8 64 89 01 48
Sep 13 15:34:07 ts200v kernel: RSP: 002b:00007ffd041a3828 EFLAGS: 00000206 ORIG_RAX: 00000000000000af
Sep 13 15:34:07 ts200v kernel: RAX: ffffffffffffffda RBX: 00000000004290c0 RCX: 000015152f04980a
Sep 13 15:34:07 ts200v kernel: RDX: 000000000041d268 RSI: 0000000000074eb0 RDI: 000015152e58b010
Sep 13 15:34:07 ts200v kernel: RBP: 000015152e58b010 R08: 0000000000000007 R09: 0000000000429050
Sep 13 15:34:07 ts200v kernel: R10: 0000000000000002 R11: 0000000000000206 R12: 000000000041d268
Sep 13 15:34:07 ts200v kernel: R13: 0000000000000000 R14: 0000000000430560 R15: 00000000004290c0
Sep 13 15:34:07 ts200v kernel: Modules linked in: corefreqk(O+) iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables e1000e x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif mxm_wmi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl intel_cstate i2c_i801 isci intel_uncore input_leds i2c_smbus libsas acpi_ipmi i2c_core ahci led_class scsi_transport_sas wmi ipmi_si libahci button [last unloaded: e1000e]
Sep 13 15:34:07 ts200v kernel: ---[ end trace 891d5964520064ac ]---
Sep 13 15:34:07 ts200v kernel: RIP: 0010:Start_Uncore_SandyBridge_EP+0x3a/0x5f [corefreqk]
Sep 13 15:34:07 ts200v kernel: Code: 45 03 00 48 c1 e2 20 89 c0 48 09 c2 48 8b 06 48 89 90 60 01 00 00 48 89 d0 48 c1 ea 20 48 0d 00 00 40 00 0f 30 b9 91 03 00 00 <0f> 32 48 c1 e2 20 89 c0 48 09 c2 48 8b 06 48 89 90 58 01 00 00 48
Sep 13 15:34:07 ts200v kernel: RSP: 0018:ffffc90009807be8 EFLAGS: 00010006
Sep 13 15:34:07 ts200v kernel: RAX: 0000000000400000 RBX: 0000000000000004 RCX: 0000000000000391
Sep 13 15:34:07 ts200v kernel: RDX: 0000000000000000 RSI: ffff888107775800 RDI: 0000000000000000
Sep 13 15:34:07 ts200v kernel: RBP: ffff88810accc000 R08: ffffff00ffffffff R09: 000000000000004e
Sep 13 15:34:07 ts200v kernel: R10: 0000015270f8e618 R11: 00000011d62d2c9d R12: 000000000000030c
Sep 13 15:34:07 ts200v kernel: R13: 0000000008c3c9c4 R14: 000000133306b24c R15: 0000000000000000
Sep 13 15:34:07 ts200v kernel: FS:  000015152ef12740(0000) GS:ffff88885fb00000(0000) knlGS:0000000000000000
Sep 13 15:34:07 ts200v kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 13 15:34:07 ts200v kernel: CR2: 000014abdf6060a0 CR3: 0000000107704003 CR4: 00000000000606e0
Sep 13 15:34:07 ts200v root: 
cyring commented 3 years ago

If the PMU version is greater than 3 then the Uncore is programmable

For testings a fix is available in branch develop of CoreFreq v1.87.3

cyring commented 3 years ago

Please show me if fix is solving the crash ?

Observe that SNB/EP Xeon(s) are therefore losing the Uncore counter.

ich777 commented 3 years ago

The user doesn't responded yet on the forums, will keep you updated...

cyring commented 2 years ago

OK, fix is waiting in development branch