dynup / kpatch

kpatch - live kernel patching
GNU General Public License v2.0
1.49k stars 305 forks source link

Questions about the current solution of correlating static local variables #545

Closed zhouchengming1 closed 1 year ago

zhouchengming1 commented 8 years ago
  1. Why undo the previous correlation in the iteration of base->symbols? If xxx.123 ---> xxx.123 (previous correlation by coincidence) xxx.256 ---> xxx.256 (previous correlation by coincidence) But real xxx.123 ---> xxx.256
    In this case, the code doesn't work. Because when find patched_sym for xxx.123, the xxx.256 in patched_object hasn't been de-correlated. So I think should undo all previous correlation before the new correlation.
  2. Is the iteration of base-sections wrong? old-object | new-object func1 | func1 xxx.123 | xxx.123 (inline) func2 | func2 xxx.256 | xxx.256 xxx.123 | xxx.123 (inline)

    When find patched_sym for xxx.123, first find xxx.123 in func1 of new-object, But then find xxx.256 in func2 of new-object. So I think should not iterate the base-sections, when find one, just go out to next symbol.

jpoimboe commented 8 years ago

Why undo the previous correlation in the iteration of base->symbols? If xxx.123 ---> xxx.123 (previous correlation by coincidence) xxx.256 ---> xxx.256 (previous correlation by coincidence) But real xxx.123 ---> xxx.256

In this case, the code doesn't work. Because when find patched_sym for xxx.123, the xxx.256 in patched_object hasn't been de-correlated. So I think should undo all previous correlation before the new correlation.

Ah, right. Instead of uncorrelating and correlating in the same loop, they should be done in two separate loops, so that all of the uncorrelations are done before starting the correlations.

Is the iteration of base-sections wrong? old-object | new-object func1 | func1 xxx.123 | xxx.123 (inline) func2 | func2 xxx.256 | xxx.256 xxx.123 | xxx.123 (inline)

When find patched_sym for xxx.123, first find xxx.123 in func1 of new-object, But then find xxx.256 in func2 of new-object. So I think should not iterate the base-sections, when find one, just go out to next symbol.

Ah, right, this is another problem. I need to think about it a little bit. Can you clarify your suggested solution?

This static local correlation has been really hard to get right :-/

zhouchengming1 commented 8 years ago

@jpoimboe , there is a solution that works fine in the same restrictions. And I just have implemented it simply, also did some test on it. I think I'd better send a patch to you later, maybe there are little problems in it.

jpoimboe commented 8 years ago

@zhouchengming1 do you have a patch which reproduces these errors? I would like to test for this in the integration test suite.

zhouchengming1 commented 8 years ago

I have a patch, but it's for my module, not for the kernel.

zhouchengming1 commented 8 years ago

I'd like to test for you, when this git repository has merged your patch.

jpoimboe commented 8 years ago

@zhouchengming1 It has been merged, please test and let me know if it works for you. Thanks!

zhouchengming1 commented 8 years ago

I just have tested the new kpatch. Q1: gcc/kernel version mismatch ? gccver="(GCC) 4.9.0" kgccver="(GNU) 4.9.0"

   so I think should just compare the last number "4.9.0".
   Change (cut -d' ' -f2-) to (cut -d' ' -f3-);
   And (cut -d ' ' -f5-) to (cut -d ' ' -f6-).
   Is it a bug or a coding error...

Q2: works for me on correlating static local variables ? ERROR: cmdline.o: symbol changed sections: .data.var.16226

   Because it's a section symbol.
   When undo the correlations for all the static locals, we forget it , should undo the correlation
   for it too.
   * Should also change name and correlate section-symbols later when found a patched_sym ?

   Additionly, I am wondering when var.17000 changed name to var.16226, then the patched 
   object had two symbols both called var.16226, is this a problem ?

Q3: print any warnings about new variables? No need "return", or just can print one new variable.

Last: I just also made a test-patch for the kernel. I have sent it to you by email.

Thanks.

jpoimboe commented 8 years ago

I just have tested the new kpatch. Q1: gcc/kernel version mismatch ? gccver="(GCC) 4.9.0" kgccver="(GNU) 4.9.0"

so I think should just compare the last number "4.9.0". Change (cut -d' ' -f2-) to (cut -d' ' -f3-); And (cut -d ' ' -f5-) to (cut -d ' ' -f6-). Is it a bug or a coding error...

This was broken with #548 and will be fixed with #561.

Q2: works for me on correlating static local variables ? ERROR: cmdline.o: symbol changed sections: .data.var.16226

Because it's a section symbol. When undo the correlations for all the static locals, we forget it , should undo the correlation for it too.

The symbol's section should already be uncorrelated here and then re-correlated here.

Is it not working for you?

  • Should also change name and correlate section-symbols later when found a patched_sym ?

    Additionly, I am wondering when var.17000 changed name to var.16226, then the patched object had two symbols both called var.16226, is this a problem ?

Yeah, I think perhaps renaming the symbol isn't the best choice. Instead maybe we should store the twin's name in a separate variable which can be referenced in kpatch_create_dynamic_rela_sections().

Q3: print any warnings about new variables? No need "return", or just can print one new variable.

Why?

I just also made a test-patch for the kernel. I have sent it to you by email.

Thanks, I'll try it out.

jpoimboe commented 8 years ago

Ok, so two issues here.

First, I was able to recreate the error:

ERROR: cmdline.o: symbol changed sections: .data.var.18451

with this patch:

Subject: [PATCH] test patch for correlating static locals

Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
---
 fs/proc/cmdline.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/fs/proc/cmdline.c b/fs/proc/cmdline.c
index bd038ef..2a092c8 100644
--- a/fs/proc/cmdline.c
+++ b/fs/proc/cmdline.c
@@ -3,6 +3,14 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>

+
+static int inline new_func(int a)
+{
+   static int var=3;
+   var++;
+   return var+a+3;
+}
+
 static int inline static_func(int a);

 static int cmdline_proc_show(struct seq_file *m, void *v)
@@ -15,7 +23,7 @@ static int cmdline_proc_show(struct seq_file *m, void *v)
            seq_printf(m, "%s\n", "hello");
        }
    }
-   return 0;
+   return new_func(3);
 }

 static int inline static_func(int a)
-- 
1.7.7

There's definitely a bug there, though I'm not sure exactly what the problem is yet.

The second issue:

Before:

Relocation section [12] '.rela.text.cmdline_proc_show' for section [11] '.text.cmdline_proc_show' at offset 0xbea8 contains 11 entries:
  Offset              Type            Value               Addend Name
  0x0000000000000001  X86_64_NONE     000000000000000000      -4 __fentry__
  0x0000000000000009  X86_64_PC32     000000000000000000      -4 saved_command_line
  0x0000000000000019  X86_64_32S      000000000000000000      +8 .rodata.str1.1
  0x0000000000000021  X86_64_PC32     000000000000000000      -4 seq_printf
  0x0000000000000033  X86_64_PC32     000000000000000000      -4 .data.var.18451
  0x0000000000000039  X86_64_PC32     000000000000000000      -4 .data.var.18447
  0x0000000000000042  X86_64_PC32     000000000000000000      -4 .data.var.18451
  0x0000000000000053  X86_64_32S      000000000000000000      +8 .rodata.str1.1
  0x000000000000005c  X86_64_PC32     000000000000000000      -4 .data.var.18447
  0x0000000000000063  X86_64_32S      000000000000000000     +12 .rodata.str1.1
  0x0000000000000068  X86_64_PC32     000000000000000000      -4 seq_printf

After:

Relocation section [ 7] '.rela.text.cmdline_proc_show' for section [ 6] '.text.cmdline_proc_show' at offset 0xbec0 contains 13 entries:
  Offset              Type            Value               Addend Name
  0x0000000000000001  X86_64_NONE     000000000000000000      -4 __fentry__
  0x0000000000000009  X86_64_PC32     000000000000000000      -4 saved_command_line
  0x0000000000000019  X86_64_32S      000000000000000000      +0 .rodata.str1.1
  0x0000000000000021  X86_64_PC32     000000000000000000      -4 seq_printf
  0x000000000000002c  X86_64_PC32     000000000000000000      -4 .data.var.18444
  0x000000000000003c  X86_64_PC32     000000000000000000      -4 .data.var.18444
  0x0000000000000043  X86_64_PC32     000000000000000000      -4 .data.var.18455
  0x0000000000000049  X86_64_PC32     000000000000000000      -4 .data.var.18451
  0x0000000000000052  X86_64_PC32     000000000000000000      -4 .data.var.18455
  0x0000000000000063  X86_64_32S      000000000000000000      +0 .rodata.str1.1
  0x000000000000006c  X86_64_PC32     000000000000000000      -4 .data.var.18451
  0x0000000000000073  X86_64_32S      000000000000000000      +4 .rodata.str1.1
  0x0000000000000078  X86_64_PC32     000000000000000000      -4 seq_printf

In theory the correlation mapping should be:

var.18451 -> var.18455
var.18447 -> var.18451
  (new)   -> var.18444

But the code tried to map it as:

var.18451 -> var.18444
var.18447 -> var.18455
  (new)   -> var.18451

There is no possible algorithm for coming up with this mapping which would always be correct in all scenarios. Therefore I think we will need to add some more restrictions.

zhouchengming1 commented 8 years ago

Ok, so two issues here.

First, I was able to recreate the error:

ERROR: cmdline.o: symbol changed sections: .data.var.18451 with this patch:

Subject: [PATCH] test patch for correlating static locals

Signed-off-by: Zhou Chengming zhouchengming1@huawei.com

fs/proc/cmdline.c | 10 +++++++++- 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/fs/proc/cmdline.c b/fs/proc/cmdline.c index bd038ef..2a092c8 100644 --- a/fs/proc/cmdline.c +++ b/fs/proc/cmdline.c @@ -3,6 +3,14 @@

include <linux/proc_fs.h>

include <linux/seq_file.h>

+ +static int inline new_func(int a) +{

  • static int var=3;
  • var++;
  • return var+a+3; +} + static int inline static_func(int a);

    static int cmdline_proc_show(struct seq_file m, void v) @@ -15,7 +23,7 @@ static int cmdline_proc_show(struct seq_file m, void v) seq_printf(m, "%s\n", "hello"); } }

  • return 0;
  • return new_func(3); }

    static int inline static_func(int a)

    1.7.7 There's definitely a bug there, though I'm not sure exactly what the problem is yet.

\ .data.var.18451 is the symbol of the section, we undo the correlation of the symbol var.18451 and its section indeed, but no the symbol of the section . ** So the function _kpatch_compare_correlatedsymbol find that the symbol .data.var.18451 has a twin, but its section don't have, and report this error. In the process of undo correlation, we should include the symol of the section (.data.var.18451) too.

The second issue:

Before:

Relocation section [12] '.rela.text.cmdline_proc_show' for section [11] '.text.cmdline_proc_show' at offset 0xbea8 contains 11 entries: Offset Type Value Addend Name 0x0000000000000001 X86_64_NONE 000000000000000000 -4 fentry 0x0000000000000009 X86_64_PC32 000000000000000000 -4 saved_command_line 0x0000000000000019 X86_64_32S 000000000000000000 +8 .rodata.str1.1 0x0000000000000021 X86_64_PC32 000000000000000000 -4 seq_printf 0x0000000000000033 X86_64_PC32 000000000000000000 -4 .data.var.18451 0x0000000000000039 X86_64_PC32 000000000000000000 -4 .data.var.18447 0x0000000000000042 X86_64_PC32 000000000000000000 -4 .data.var.18451 0x0000000000000053 X86_64_32S 000000000000000000 +8 .rodata.str1.1 0x000000000000005c X86_64_PC32 000000000000000000 -4 .data.var.18447 0x0000000000000063 X86_64_32S 000000000000000000 +12 .rodata.str1.1 0x0000000000000068 X86_64_PC32 000000000000000000 -4 seq_printf After:

Relocation section [ 7] '.rela.text.cmdline_proc_show' for section [ 6] '.text.cmdline_proc_show' at offset 0xbec0 contains 13 entries: Offset Type Value Addend Name 0x0000000000000001 X86_64_NONE 000000000000000000 -4 fentry 0x0000000000000009 X86_64_PC32 000000000000000000 -4 saved_command_line 0x0000000000000019 X86_64_32S 000000000000000000 +0 .rodata.str1.1 0x0000000000000021 X86_64_PC32 000000000000000000 -4 seq_printf 0x000000000000002c X86_64_PC32 000000000000000000 -4 .data.var.18444 0x000000000000003c X86_64_PC32 000000000000000000 -4 .data.var.18444 0x0000000000000043 X86_64_PC32 000000000000000000 -4 .data.var.18455 0x0000000000000049 X86_64_PC32 000000000000000000 -4 .data.var.18451 0x0000000000000052 X86_64_PC32 000000000000000000 -4 .data.var.18455 0x0000000000000063 X86_64_32S 000000000000000000 +0 .rodata.str1.1 0x000000000000006c X86_64_PC32 000000000000000000 -4 .data.var.18451 0x0000000000000073 X86_64_32S 000000000000000000 +4 .rodata.str1.1 0x0000000000000078 X86_64_PC32 000000000000000000 -4 seq_printf In theory the correlation mapping should be:

var.18451 -> var.18455 var.18447 -> var.18451 (new) -> var.18444 But the code tried to map it as:

var.18451 -> var.18444 var.18447 -> var.18455 (new) -> var.18451 There is no possible algorithm for coming up with this mapping which would always be correct in all scenarios. Therefore I think we will need to add some more restrictions.

You are right, when I solved the first issue, this second issue caused another error. cmdline.o: data section .data.var.16219 selected for inclusion

Because the var.16219 was correlated to a wrong static local , so the kpatch-build thought its section data ( initial value) was changed, and reported this error.

Thanks

jpoimboe commented 8 years ago

Thanks, this was an excellent test case BTW. I'll have to figure out a way to incorporate it into the integration tests. For reference:

From 54d238d4942ffe53f67884a7f0b2fc9dc57236ac Mon Sep 17 00:00:00 2001
From: Zhou Chengming <zhouchengming1@huawei.com>
Date: Tue, 17 Nov 2015 20:59:30 +0800
Subject: [PATCH] test base for correlating static locals

Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
---
 fs/proc/cmdline.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/fs/proc/cmdline.c b/fs/proc/cmdline.c
index cbd82df..bd038ef 100644
--- a/fs/proc/cmdline.c
+++ b/fs/proc/cmdline.c
@@ -3,15 +3,37 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>

+static int inline static_func(int a);
+
 static int cmdline_proc_show(struct seq_file *m, void *v)
 {
    seq_printf(m, "%s\n", saved_command_line);
+   if (v == NULL) {
+       static int var=1;
+       if (var+static_func(var)) {
+           var++;
+           seq_printf(m, "%s\n", "hello");
+       }
+   }
    return 0;
 }

+static int inline static_func(int a)
+{
+   static int var=2;
+   var++;
+   return var+a;
+}
+
 static int cmdline_proc_open(struct inode *inode, struct file *file)
 {
+   if (static_func(2))
    return single_open(file, cmdline_proc_show, NULL);
+   else {
+       static int var=3;
+       var++;
+       return var;
+   }
 }

 static const struct file_operations cmdline_proc_fops = {
-- 
1.7.7
From 2b98a061983e51e7e9a820af4e468ba2a1fc5b7c Mon Sep 17 00:00:00 2001
From: Zhou Chengming <zhouchengming1@huawei.com>
Date: Tue, 17 Nov 2015 21:02:03 +0800
Subject: [PATCH] test patch for correlating static locals

Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
---
 fs/proc/cmdline.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/fs/proc/cmdline.c b/fs/proc/cmdline.c
index bd038ef..2a092c8 100644
--- a/fs/proc/cmdline.c
+++ b/fs/proc/cmdline.c
@@ -3,6 +3,14 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>

+
+static int inline new_func(int a)
+{
+   static int var=3;
+   var++;
+   return var+a+3;
+}
+
 static int inline static_func(int a);

 static int cmdline_proc_show(struct seq_file *m, void *v)
@@ -15,7 +23,7 @@ static int cmdline_proc_show(struct seq_file *m, void *v)
            seq_printf(m, "%s\n", "hello");
        }
    }
-   return 0;
+   return new_func(3);
 }

 static int inline static_func(int a)
-- 
1.7.7
jpoimboe commented 8 years ago

I propose the following restrictions for static locals:

  1. No removal of any static local references.
  2. If there is more than one static local variable with the same name in the same function, their order of reference must not change. For example, if A.1 is accessed before A.2 in the original function, A.1 must also be accessed before A.2 in the new function.
  3. A new reference to a static local cannot be added to a function unless it's a reference to a new variable with a unique name within the function. This means if there's already a reference to a static local named 'var' in the function, the patch cannot add another reference to that 'var' or another 'var'.

These restrictions are more limiting but will make it much safer IMO. And I think there should usually be ways to work around these restrictions if you get creative with modifying the patch.

So I think I'll need to rewrite the code yet again.

jpoimboe commented 8 years ago

I've fixed the first issue with #564.

zhouchengming1 commented 8 years ago

I propose the following restrictions for static locals:

No removal of any static local references. If there is more than one static local variable with the same name in the same function, their order of reference must not change. For example, if A.1 is accessed before A.2 in the original function, A.1 must also be accessed before A.2 in the new function.

After seeing the relocation table above, I am worried that this restriction is hard to get. Because a little change (no new static local) may change the order of reference, and kpatch-build won't know it.

A new reference to a static local cannot be added to a function unless it's a reference to a new variable with a unique name within the function. This means if there's already a reference to a static local named 'var' in the function, the patch cannot add another reference to that 'var' or another 'var'. These restrictions are more limiting but will make it much safer IMO. And I think there should usually be ways to work around these restrictions if you get creative with modifying the patch.

So I think I'll need to rewrite the code yet again.

jpoimboe commented 8 years ago

After seeing the relocation table above, I am worried that this restriction is hard to get. Because a little change (no new static local) may change the order of reference, and kpatch-build won't know it.

Yes, there's no way for kpatch-build to enforce this restriction, and there's a very small chance that it could happen. The best we can do is clearly document this restriction.

The only other option I can think of is to make it even more restrictive: disallow patching of any functions which reference two static local variables of the same name.

github-actions[bot] commented 1 year ago

This issue has been open for 30 days with no activity and no assignee. It will be closed in 7 days unless a comment is added.

github-actions[bot] commented 1 year ago

This issue was closed because it was inactive for 7 days after being marked stale.