cyring / CoreFreq

CoreFreq : CPU monitoring and tuning software designed for 64-bit processors.
https://www.cyring.fr
GNU General Public License v2.0
1.97k stars 126 forks source link

kernel 5.14 build fails #272

Closed thor2002ro closed 3 years ago

thor2002ro commented 3 years ago
corefreqk.c:9947:57: fatal error: no member named 'state' in 'struct task_struct'
                SysGate->taskList[cnt].state    = (short int) thread->state;
                                                              ~~~~~~  ^
cyring commented 3 years ago

It might have a reason why this change is happening in the kernel here: https://elixir.bootlin.com/linux/v5.14-rc6/source/include/linux/sched.h#L669

Fix is to replace with:

(short int) thread->__state;

It sounds like 5.14 is preparing to remove that field !

cyring commented 3 years ago

This kernel commit

Rename it in order to find all uses ...

Really !

cyring commented 3 years ago

@thor2002ro : please switch to develop branch to get the fix.

thor2002ro commented 3 years ago

I don't think its enough. :)

EDIT in Markdown

Aug 24 20:42:28 Tower kernel: BUG: unable to handle page fault for address: ffffffffa0d41728
Aug 24 20:42:28 Tower kernel: #PF: supervisor write access in kernel mode
Aug 24 20:42:28 Tower kernel: #PF: error_code(0x0003) - permissions violation
Aug 24 20:42:28 Tower kernel: PGD 280e067 P4D 280e067 PUD 280f063 PMD 13dcca067 PTE 144cf5161
Aug 24 20:42:28 Tower kernel: Oops: 0003 [#1] SMP NOPTI
Aug 24 20:42:28 Tower kernel: CPU: 1 PID: 32589 Comm: modprobe Tainted: G        W  O      5.14.0-rc6-thor-Unraid #10
Aug 24 20:42:28 Tower kernel: Hardware name: Gigabyte Technology Co., Ltd. X470 AORUS GAMING 7 WIFI/X470 AORUS GAMING 7 WIFI-CF, BIOS F61b 04/01/2021
Aug 24 20:42:28 Tower kernel: RIP: 0010:Compute_AMD_Zen_Boost+0x33/0x8d0 [corefreqk]
Aug 24 20:42:28 Tower kernel: Code: 56 41 55 41 54 53 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 41 89 ff 48 8b 05 cd 62 02 00 48 8b 40 20 4a 8b 04 f8  80 f8 08 00 00 00 00 00 00 48 8b 05 b4 62 02 00 48 8b 40 20 4a
Aug 24 20:42:28 Tower kernel: RSP: 0018:ffffc90000bcb9c0 EFLAGS: 00010286
Aug 24 20:42:28 Tower kernel: RAX: ffffffffa0d40e30 RBX: 0000001000000000 RCX: 00000000436f7265
Aug 24 20:42:28 Tower kernel: RDX: 0000000000000000 RSI: ffff8885e7d27100 RDI: 0000000000000000
Aug 24 20:42:28 Tower kernel: RBP: ffffc90000bcba80 R08: 0000000000100002 R09: ffff888102f42000
Aug 24 20:42:28 Tower kernel: R10: ffffffffa0d40e30 R11: ffffffffa0d42a60 R12: ffff888102f4219c
Aug 24 20:42:28 Tower kernel: R13: ffff8881cbc53c00 R14: 0000000000000000 R15: 0000000000000000
Aug 24 20:42:28 Tower kernel: FS:  0000153549d66740(0000) GS:ffff88882e840000(0000) knlGS:0000000000000000
Aug 24 20:42:28 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 24 20:42:28 Tower kernel: CR2: ffffffffa0d41728 CR3: 00000007455ec000 CR4: 0000000000750ee0
Aug 24 20:42:28 Tower kernel: PKRU: 55555554
Aug 24 20:42:28 Tower kernel: Call Trace:
Aug 24 20:42:28 Tower kernel: Query_AMD_Family_17h+0x2c9/0x850 [corefreqk]
Aug 24 20:42:28 Tower kernel: Controller_Init+0xb6/0x620 [corefreqk]
Aug 24 20:42:28 Tower kernel: CoreFreqK_Ignition_Level_Up+0x3b5/0xb20 [corefreqk]
Aug 24 20:42:28 Tower kernel: init_module+0x44/0x1000 [corefreqk]
Aug 24 20:42:28 Tower kernel: ? 0xffffffffa0cc8000
Aug 24 20:42:28 Tower kernel: do_one_initcall+0xbe/0x230
Aug 24 20:42:28 Tower kernel: ? idr_alloc_cyclic+0x17b/0x220
Aug 24 20:42:28 Tower kernel: ? __vunmap+0x208/0x4d0
Aug 24 20:42:28 Tower kernel: ? kfree+0x1aa/0x310
Aug 24 20:42:28 Tower kernel: ? free_unref_page+0xca/0x1f0
Aug 24 20:42:28 Tower kernel: ? do_init_module+0x25/0x370
Aug 24 20:42:28 Tower kernel: ? kmem_cache_alloc_trace+0x19e/0x430
Aug 24 20:42:28 Tower kernel: do_init_module+0x5b/0x370
Aug 24 20:42:28 Tower kernel: load_module+0x4599/0x50e0
Aug 24 20:42:28 Tower kernel: __se_sys_init_module+0x216/0x260
Aug 24 20:42:28 Tower kernel: do_syscall_64+0x41/0xc0
Aug 24 20:42:28 Tower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Aug 24 20:42:28 Tower kernel: RIP: 0033:0x153549ea909a
Aug 24 20:42:28 Tower kernel: Code: 48 8b 0d f9 7d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c6 7d 0c 00 f7 d8 64 89 01 48
Aug 24 20:42:28 Tower kernel: RSP: 002b:00007fff0c036838 EFLAGS: 00000202 ORIG_RAX: 00000000000000af
Aug 24 20:42:28 Tower kernel: RAX: ffffffffffffffda RBX: 0000000000428080 RCX: 0000153549ea909a
Aug 24 20:42:28 Tower kernel: RDX: 000000000041c368 RSI: 00000000000b1ff8 RDI: 00001535493c9010
Aug 24 20:42:28 Tower kernel: RBP: 00001535493c9010 R08: 000000000042701a R09: 0000000000000001
Aug 24 20:42:28 Tower kernel: R10: 0000000000427010 R11: 0000000000000202 R12: 000000000041c368
Aug 24 20:42:28 Tower kernel: R13: 0000000000000000 R14: 00000000004281b0 R15: 0000000000428080
Aug 24 20:42:28 Tower kernel: Modules linked in: corefreqk(O+) xt_nat xt_tcpudp nf_tables macvlan vhost_net tun vhost vhost_iotlb tap xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter md_mod vendor_reset(O) amdgpu mfd_core drm_ttm_helper ttm gpu_sched drm_kms_helper drm backlight agpgart sysimgblt syscopyarea sysfillrect fb_sys_fops it87 hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables edac_mce_amd kvm_amd kvm wmi_bmof gigabyte_wmi mxm_wmi btusb btrtl btbcm btintel ghash_clmulni_intel aesni_intel bluetooth igb crypto_simd cryptd corsair_psu input_leds ahci ecdh_generic i2c_piix4 i2c_algo_bit i2c_core led_class rapl ecc wmi ccp libahci button zenpower(O) acpi_cpufreq ryzen_smu(O)
Aug 24 20:42:28 Tower kernel: CR2: ffffffffa0d41728
Aug 24 20:42:28 Tower kernel: ---[ end trace 9ec0ff72c42472fe ]---
Aug 24 20:42:28 Tower kernel: RIP: 0010:Compute_AMD_Zen_Boost+0x33/0x8d0 [corefreqk]
Aug 24 20:42:28 Tower kernel: Code: 56 41 55 41 54 53 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 41 89 ff 48 8b 05 cd 62 02 00 48 8b 40 20 4a 8b 04 f8  80 f8 08 00 00 00 00 00 00 48 8b 05 b4 62 02 00 48 8b 40 20 4a
Aug 24 20:42:28 Tower kernel: RSP: 0018:ffffc90000bcb9c0 EFLAGS: 00010286
Aug 24 20:42:28 Tower kernel: RAX: ffffffffa0d40e30 RBX: 0000001000000000 RCX: 00000000436f7265
Aug 24 20:42:28 Tower kernel: RDX: 0000000000000000 RSI: ffff8885e7d27100 RDI: 0000000000000000
Aug 24 20:42:28 Tower kernel: RBP: ffffc90000bcba80 R08: 0000000000100002 R09: ffff888102f42000
Aug 24 20:42:28 Tower kernel: R10: ffffffffa0d40e30 R11: ffffffffa0d42a60 R12: ffff888102f4219c
Aug 24 20:42:28 Tower kernel: R13: ffff8881cbc53c00 R14: 0000000000000000 R15: 0000000000000000
Aug 24 20:42:28 Tower kernel: FS:  0000153549d66740(0000) GS:ffff88882e840000(0000) knlGS:0000000000000000
Aug 24 20:42:28 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 24 20:42:28 Tower kernel: CR2: ffffffffa0d41728 CR3: 00000007455ec000 CR4: 0000000000750ee0
Aug 24 20:42:28 Tower kernel: PKRU: 55555554
cyring commented 3 years ago

@thor2002ro : you don't tell me enough. Which Processor ? Which Linux distribution ?

Was CoreFreq running ok with kernel 5.13 ? Master and/or Develop branch ?

EDIT: Issue could be linked with PTI or kernel mitigation mechanisms. Can you disable all of them at boot ?

thor2002ro commented 3 years ago

the cpu is 5900x distro? does it matter ? fails to load module its a pure kernel thing(I think i cut the dmesg too early... the error occurs when loading the module without running the app).... but ok its unraid... I build my own kernels so there should be no interference from the distro.... running fine on 5.13 branch is develop.... mitigations are disable by default for me

I think 5.14 final will drop this week will see if it still breaks....

cyring commented 3 years ago

Just built 5.14.0-rc7 Except a few C fall-through warnings to fix up, CoreFreq is apparently running fine. Fyi: ArchLinux

PXL_20210828_144428159 NIGHT

EDIT: KFENCE is showing up when unloading driver, with 5.14 only.

corefreqk: loading out-of-tree module taints kernel.
corefreqk: module verification failed: signature and/or required key missing - tainting kernel
CoreFreq(14:30): Processor [ 8F_71] Architecture [Zen2/Matisse] SMT [32/32]
==================================================================
BUG: KFENCE: use-after-free read in __static_call_text_end+0x466/0x4af

Use-after-free read at 0x0000000031c62872 (in kfence-#151):
 __static_call_text_end+0x466/0x4af
 d_lookup+0x29/0x40
 lookup_dcache+0x18/0x60
 __lookup_hash+0x20/0xa0
 kern_path_locked+0x9b/0x110
 handle_remove+0x76/0x2e0
 devtmpfs_work_loop.cold+0xc/0x13
 devtmpfsd+0x25/0x34
 kthread+0x12f/0x160
 ret_from_fork+0x1f/0x30

kfence-#151: 0x00000000353de047-0x0000000026a95dc9, size=4096, cache=names_cache

allocated by task 239 on cpu 24 at 398.583578s:
 getname_kernel+0x25/0x110
 kern_path_locked+0x3f/0x110
 handle_remove+0x76/0x2e0
 devtmpfs_work_loop.cold+0xc/0x13
 devtmpfsd+0x25/0x34
 kthread+0x12f/0x160
 ret_from_fork+0x1f/0x30

freed by task 239 on cpu 24 at 398.583584s:
 kern_path_locked+0x68/0x110
 handle_remove+0x76/0x2e0
 devtmpfs_work_loop.cold+0xc/0x13
 devtmpfsd+0x25/0x34
 kthread+0x12f/0x160
 ret_from_fork+0x1f/0x30

CPU: 24 PID: 239 Comm: kdevtmpfs Tainted: G           OE     5.14.0-rc7-next-20210827-1-next-git #1
Hardware name: ASUS System Product Name/ROG CROSSHAIR VIII HERO (WI-FI), BIOS 3801 07/30/2021
==================================================================
CoreFreq: Unload
thor2002ro commented 3 years ago

interesting maybe it was an issue in rc6.... I'll wait for the final 5.14 and rebuild it and report back here,,,, should be any day now....

cyring commented 3 years ago

interesting maybe it was an issue in rc6.... I'll wait for the final 5.14 and rebuild it and report back here,,,, should be any day now....

Yes, let's wait and see but that thread->__state should be renamed by its author. Kernel source browser, such as Bootlin, exists for some good reasons ...

Marking original issue as bugfix

ppascher commented 3 years ago

Develop branch appears to work fine with 5.14

cyring commented 3 years ago

Develop branch appears to work fine with 5.14

Thank you for your confirmation.

thor2002ro commented 3 years ago

ok I got around some testing time today with 5.14 and I figured out what the problem was.... I was building my kernels with clang and full lto optimization.... seams the corefreqk module breaks when built with full lto(main binary and modules get optimized) clang building the kernel with thin lto(only the main binary gets lto) works no problem....

Screenshot from 2021-08-31 22-39-44

edit I do seam to get some messages tho.... and keeps working.... Screenshot from 2021-08-31 22-48-45

cyring commented 3 years ago

I'm interested in your build! You have to show me how you managed to compile the kernel module with clang ?

About the error. If not due to incompatible request; be aware that some keystrokes can be sent and miss interpreted by the UI. For example:

thor2002ro commented 3 years ago

I build it like this....

make CC=clang LLVM=1 LLVM_IAS=1 DELAY_TSC=1 OPTIM_LVL=3 WARNING="-Wall -Wfatal-errors -static -pthread" KERNELDIR=$KERNEL_LOCATION all

cyring commented 3 years ago

make CC=clang LLVM=1 LLVM_IAS=1 DELAY_TSC=1 OPTIM_LVL=3 WARNING="-Wall -Wfatal-errors -static -pthread" KERNELDIR=$KERNEL_LOCATION all

Thanks but I still have this:

clang-12: fatal error: unknown argument: '-fplugin-arg-structleak_plugin-byref-all'

and these w/o -Wfatal-errors :

clang-12: error: unknown argument: '-fplugin-arg-structleak_plugin-byref-all'
clang-12: error: unsupported option '-mrecord-mcount' for target 'x86_64-pc-linux-gnu'
thor2002ro commented 3 years ago

now depends on distro ..... I remember you saying you're using arch remove "-static" arch doesn't have static libs, i think you can remove the hole WARNING section that replaces the default one.... on that front I always use ubuntu build vm's to build stuff I found over the years they are less headache building stuff for any distro and they have static libs :)

also make sure ld.bfd is supplying the ld , ld.gold doesnt play well with kernel stuff

edit: also you need to rebuild the kernel with clang.... it wont work building module with clang and kernel built with gcc.... I think the error is related to this

cyring commented 3 years ago

Using the 5.14.1-arch1-1 release of ArchLinux, the error Use-after-free read has disappeared.

Later I will try to find time to build a full clang-ed kernel

So far, behavior of CoreFreq is nominal

cyring commented 3 years ago

Probably part of the issue: https://www.theregister.com/2021/09/08/compromise_linux_kernel_compiler_warnings/

thor2002ro commented 3 years ago

That's 5.15 or can be tested in Linux next.... Should be fun they forced -Werror on everything 🤣