Closed w-kudla closed 5 months ago
It works with CONFIG_KALLSYMS_ALL=y only.
comment is misleading. All the Onload kallsyms machinery works for linux <= 5.6, where kallsyms_on_each_symbol()
symbol is exported.
@ol-alexandra Not sure if I understand. Do you mean that kernels 6.x are not supported at all?
I mean that linux-6.x are supported in a more complicated way, not via kallsyms. EFRM_HAVE_NEW_KALLSYMS
is undefined for contemporary kernels.
And "more complicated way" is very fragile. Some variant of linux-6.8 probably works, see #211 from @okt-sergeyn . Your kernel is probably not supported, or your kernel have Onload issues when running on your hardware (yes, such thing happens). model name
& flags
lines from /proc/cpuinfo
may help to understand the issue.
I think @tcrawley-xilinx has a plan to fix it. A similar issue was reported as the latest comment in https://github.com/Xilinx-CNS/onload/issues/164.
JFYI: it works well with Fedora 39 6.8.4-200.fc39.x86_64
model name
&flags
lines from/proc/cpuinfo
may help to understand the issue.
model name : Intel(R) Xeon(R) CPU Max 9468
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl smx est tm2 ssse3 sdbg cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap clflushopt clwb intel_pt sha_ni cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni wbnoinvd dtherm ida arat pln pts hfi umip waitpkg tme la57 rdpid bus_lock_detect cldemote movdiri movdir64b fsrm md_clear serialize tsxldtrk pconfig arch_lbr amx_bf16 amx_int8 flush_l1d arch_capabilities
@ol-alexandra We don't have any issues with this HW on RHEL 8.x and RHEL 9.x but those run kernels 3.10 and 4.18 respectively
JFYI: it works well with Fedora 39
6.8.4-200.fc39.x86_64
Thanks, downgrading to 6.5.6-300.fc39.x86_64 also helped. Can you please explain why the elegant kallsyms solution stopped working for the newer kernels? I saw some convoluted way of determining syscall table offset in the sources that works by examining instructions/text directly which I agree is dodgy.
Linux kernel stopped from exporting kallsyms_on_each_symbol()
and similar function. These functions were used to pull GPL-only symbols to proprietary modules, so Linux authors had their strong reason.
You can see some part of this story in Onload history under ON-12093 starting from 62e19f02628957be523229b0d454d440849c2c29
These functions were used to pull GPL-only symbols to proprietary modules, so Linux authors had their strong reason.
I don't think this is a good enough reason and as much as it is disappointing, sadly it's not unexpected from Linux authors. There has been a widespread trend of pulling the rug from under out-of-tree stuff or userspace without offering any workarounds.
Same is happening with wrmsr or disabling of cli/sti instructions.
What is the plan for onload? Rely on the nasty heuristics in find_syscall_table
?
Hi @w-kudla, we've done some work to improve our ability to find the syscall table for v6.9 kernels in the following commits:
These have been tested mainly on Debian (12) and Ubuntu (22.04/23.10). We are yet to test it on Fedora 39, but these changes are available on the master branch so you can try them out early if you'd like!
Hi @w-kudla, we've done some work to improve our ability to find the syscall table for v6.9 kernels in the following commits:
* [6abf274](https://github.com/Xilinx-CNS/onload/commit/6abf27413b23a8f6e6f3b985271eb1712f5cc562) ("ON-15692: Use x64_sys_call function when syscall_table isn't available") * [d04f55b](https://github.com/Xilinx-CNS/onload/commit/d04f55ba3460130b00d08e41902656c33603cf7c) ("ON-15742: match the new CONFIG_RETPOLINE option name for v6.9+ kernels") * [a9d697b](https://github.com/Xilinx-CNS/onload/commit/a9d697b42d3563ce53787159f420c0ae1138a062) ("ON-15692: Catch more calls to x64_sys_call")
These have been tested mainly on Debian (12) and Ubuntu (22.04/23.10). We are yet to test it on Fedora 39, but these changes are available on the master branch so you can try them out early if you'd like!
Thank you guys for the fix.
I've tried the latest master with the commits above on Centos9 6.8.7-1.el9.elrepo.x86_64
and it still happens to fail with the error: [sfc efrm] init_sfc_resource: ERROR: failed to find syscall table
Same problem has cropped up with the 6.1.85 and newer 6.1 series kernel.org kernels - 6.1.84 was fine.
@ech68 you should find adae58dd79cd5c47bed0958a664f6cf9f333e77e fixes additional cases, hopefully yours included.
That did indeed work - applying this change along with the other 3 patches mentioned earlier in this thread against the latest 8.1.2.26 openonload package and building for the 6.1.85+ kernels.
It fails to compile when built against the current stock EL8 kernel-devel, however, breaking at line 271 in 6abf274
@ech68 Did you try using the tip of the master
branch? If that works then you may be missing other compatibility changes required for the given kernel.
If I apply the first part of commit deaaa8d, then it compiles. (using the tip of master also works)
Closing as I believe there are no known issues with this now.
The whole build passes (master, 88527b4743a01eae14be721da6b252a53c304de2) but then onload module will fail to load because sfc_resource cannot init:
[152798.664550] [sfc efrm] init_sfc_resource: ERROR: failed to find syscall table
I checked how the code gets the syscall table and we should be in the simplest case:
because this kernel (6.8.6-200.fc39.x86_64) is built with all symbols exported to kallsyms:
The symbol is present in kallsyms:
What am I missing?