Closed KernelKyupCom closed 1 year ago
From what I've tried it seems that the only way to compile and load the livepatch is to forcibly set the "patch->immediate". What the side effects could be ? This way we are skipping the consistency model, but would it work in the old kpatch manner ?
Yes, unfortunately ORC still doesn't support the consistency model. That support will be coming soon. In the meantime, frame pointers are recommended.
If you have to use ORC, then setting patch->immediate is the only way to do it. Then you have to be more careful with the patch analysis, since old functions can coexist with new functions, even on the same task. (But for most patches, that's not a problem.)
I've changed my config, but currently I have a bunch of servers running the already compiled with ORC kernel and it is desirable to be able to livepatch them somehow.
Isn't it good idea to have this defined by kpatch-build for x86 in case there is no CONFIG_HAVE_RELIABLESTACKTRACE similarly to the #ifdef (__powerpc_\) ?
Last days I had another urgent tasks, so I've been away for a while from the livepatching issues, but I am turning back to it. I'am still fighting the other problem - patching functions containing static variables.
It turns out that from build to build the symbol names (specifically the static vars suffixes) will change.
This causes missmatch of the symbols when looking for locals. I am still trying to understand how we are never getting such a differences between the two sequential builds of the original and patches kernels, but there are differences between the production and original/patched symbols.
I've double checked the toolchain version, there is no difference, the environment is set the same way building the production kernel and livepatch. Even building the kernel and immediately building livepatch for it can result in such a difference. In this case the environment is definitely the same...
I've tried to hack it using objcopy --redefine-syms for vmlinux and patched objects, before calling the creare-diff-object then I am failing for the .bss.
Isn't it good idea to have this defined by kpatch-build for x86 in case there is no CONFIG_HAVE_RELIABLE_STACKTRACE similarly to the #ifdef (powerpc) ?
Yeah, that's probably a good idea.
I've double checked the toolchain version, there is no difference, the environment is set the same way building the production kernel and livepatch. Even building the kernel and immediately building livepatch for it can result in such a difference. In this case the environment is definitely the same... I've tried to hack it using objcopy --redefine-syms for vmlinux and patched objects, before calling the creare-diff-object then I am failing for the .bss.
, which I wasn't able to look at so far. Do you think if we correlate the production - original before we are finding the local symbols that would resolve it ?
No. If there's a mismatch in the symbols between the production compiled kernel and the kpatch original compiled kernel, this violates some basic kpatch assumptions. So it wouldn't be wise to hack a fix together without first understanding why there's a mismatch.
As I suggested in my email, save the "production" ip_tables.o file somewhere, and then after you do kpatch-build with -d, save the ~/.kpatch/tmp/orig/.../ip_tables.o file. Copy them somewhere or attach them to this issue so I can compare them.
Here they are. I am attaching the diff with all the changes on the source code I have + I have added "-x" to the kpatch-build build script, so the build log contains the script commands executed too. The Archive also contains the production,original and patched ip_tables.o files. kpatch-debug.zip
There seems to be some mismatch, either in your compiler, config, or kernel source.
cleanup_match() in production ip_tables.o:
0000000000000020 <cleanup_match>:
20: e8 00 00 00 00 callq 25 <cleanup_match+0x5>
21: R_X86_64_PC32 __fentry__-0x4
25: 55 push %rbp
26: 48 83 c7 20 add $0x20,%rdi
2a: 48 89 e5 mov %rsp,%rbp
2d: 48 83 ec 20 sub $0x20,%rsp
31: 48 8b 47 e8 mov -0x18(%rdi),%rax
35: 48 89 75 e0 mov %rsi,-0x20(%rbp)
39: 48 89 7d f0 mov %rdi,-0x10(%rbp)
3d: c6 45 f8 02 movb $0x2,-0x8(%rbp)
41: 48 8b 50 40 mov 0x40(%rax),%rdx
45: 48 89 45 e8 mov %rax,-0x18(%rbp)
49: 48 85 d2 test %rdx,%rdx
4c: 74 0a je 58 <cleanup_match+0x38>
4e: 48 8d 7d e0 lea -0x20(%rbp),%rdi
52: ff d2 callq *%rdx
54: 48 8b 45 e8 mov -0x18(%rbp),%rax
58: 48 8b 78 58 mov 0x58(%rax),%rdi
5c: e8 00 00 00 00 callq 61 <cleanup_match+0x41>
5d: R_X86_64_PC32 module_put-0x4
61: c9 leaveq
62: c3 retq
63: 0f 1f 00 nopl (%rax)
66: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
6d: 00 00 00
cleanup_match() in orig ip_tables.o:
0000000000000000 <cleanup_match>:
0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
1: R_X86_64_NONE __fentry__-0x4
5: 55 push %rbp
6: 48 83 c7 20 add $0x20,%rdi
a: 48 89 e5 mov %rsp,%rbp
d: 48 83 ec 20 sub $0x20,%rsp
11: 48 8b 47 e8 mov -0x18(%rdi),%rax
15: 48 89 75 e0 mov %rsi,-0x20(%rbp)
19: 48 89 7d f0 mov %rdi,-0x10(%rbp)
1d: c6 45 f8 02 movb $0x2,-0x8(%rbp)
21: 48 8b 50 40 mov 0x40(%rax),%rdx
25: 48 89 45 e8 mov %rax,-0x18(%rbp)
29: 48 85 d2 test %rdx,%rdx
2c: 74 0d je 3b <cleanup_match+0x3b>
2e: 48 8d 7d e0 lea -0x20(%rbp),%rdi
32: e8 00 00 00 00 callq 37 <cleanup_match+0x37>
33: R_X86_64_PC32 __x86_indirect_thunk_rdx-0x4
37: 48 8b 45 e8 mov -0x18(%rbp),%rax
3b: 48 8b 78 58 mov 0x58(%rax),%rdi
3f: e8 00 00 00 00 callq 44 <cleanup_match+0x44>
40: R_X86_64_PC32 module_put-0x4
44: c9 leaveq
45: c3 retq
Notice that the production .o has an indirect call (without a retpoline):
52: ff d2 callq *%rdx
While the orig .o has a retpoline call instead:
32: e8 00 00 00 00 callq 37 <cleanup_match+0x37>
33: R_X86_64_PC32 __x86_indirect_thunk_rdx-0x4
So the production kernel was either built without CONFIG_RETPOLINE or without a version of GCC which supports retpolines, whereas the kernel built by kpatch-build has both.
Hi, Josh. That's a good catch and I was hoping it's over, but both has been built with the same gcc and the .config has the CONFIG_RETPOLINE set. Unfortunately it seems something worse than the config or build environment is wrong:
root@kernighan:/home/*****/releases/linux-4.15.9-test_test1 # uname -a
Linux kernighan 4.15.9-test_test1 #36 SMP Wed Mar 14 09:37:18 EET 2018 x86_64 x86_64 x86_64 GNU/Linux
root@kernighan:/home/*****/releases/linux-4.15.9-test_test1 # zcat /proc/config.gz | grep RETPOL
CONFIG_RETPOLINE=y
root@kernighan:/home/*****/releases/linux-4.15.9-test_test1 # cat .config | grep RETPOL
CONFIG_RETPOLINE=y
root@kernighan:/home/*****/releases # readelf -p .comment /home/*****/releases/vmlinux-4.15.9-test_test1 | grep -o 'GCC:.*'
GCC: (GNU) 7.3.0
root@kernighan:~/ssd/linux-4.15.y # readelf -p .comment net/ipv4/netfilter/ip_tables.o | grep -o 'GCC:.*'
GCC: (GNU) 7.3.0
root@kernighan:~/.kpatch/tmp # find . -name ip_tables.o
./patched/net/ipv4/netfilter/ip_tables.o
./orig/net/ipv4/netfilter/ip_tables.o
root@kernighan:~/.kpatch/tmp # readelf -p .comment ./patched/net/ipv4/netfilter/ip_tables.o | grep -o 'GCC:.*'
GCC: (GNU) 7.3.0
root@kernighan:~/.kpatch/tmp # readelf -p .comment ./orig/net/ipv4/netfilter/ip_tables.o | grep -o 'GCC:.*'
GCC: (GNU) 7.3.0
root@kernighan:~/.kpatch/tmp #
Today I am about to debug the compiler options when building a production kernel and livepatch. Any ideas are welcome.
BR, Angel
Just an idea, but your GCC version is unusually vague. I guess it was built directly from source? You might want to double check that there isn't a mismatch there.
Most distro versions of GCC are much more specific, like gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)
.
Yup, I've built it on my own a few days after it has been released, since it wasn't available in the package repository. I did it because of the retpolines support. I have no other 7.3, and I've built it once only, so no chance for mismatched toolchain versions.
Ah, ok. I'm out of ideas. It still seems like something was weird with your production build, since the function doesn't use a retpoline. You might want to double check, but it looks like your production kernels are still vulnerable to Spectre v2.
@KernelKyupCom : there has been no activity on this for a long time, so I'm closing this issue. If still have problems with recent releases, feel free to reopen.
Hi there, It seems that the only way to get usable live patching x86 with 4.15.x is to have the UNWINDER_FRAME_POINTER, since HAVE_RELIABLE_STACKTRACE is selected by: X86 [=y] && X86_64 [=y] && UNWINDER_FRAME_POINTER [=y] && STACK_VALIDATION [=y]
Within my current production config i have the following setup: CONFIG_X86_64=y CONFIG_STACK_VALIDATION=y CONFIG_UNWINDER_ORC=y # CONFIG_UNWINDER_FRAME_POINTER is not set For this reason there is no CONFIG_HAVE_RELIABLE_STACKTRACE. Building a livepatch for that kernel generated livepatch (not kpatch). Trying to load it results in refusal to load it with the following reason: livepatch: This architecture doesn't have support for the livepatch consistency model. Which comes from kernel/livepatch/core.c:839 ` /*
with the consistency model. */ if (!klp_have_reliable_stack() && !patch->immediate) { pr_err("This architecture doesn't have support for the livepatch consistency model.\n"); return -ENOSYS; }
` Initially I've tried to force kpatch to be built, but it again fails with the same reason, since immediate is not set when building kpatch. Plase advise how can I have a livepatch /kpatch loaded with a kernel built with CONFIG_UNWINDER_ORC.