dynup / kpatch

kpatch - live kernel patching
GNU General Public License v2.0
1.49k stars 305 forks source link

Handling changed line number __LINE__ and maybe some other issues #1253

Closed liu-song-6 closed 2 years ago

liu-song-6 commented 2 years ago

Hi,

We hit the following error while building some patch

ld: final link failed: Bad value

After digging into it, we found the issue was caused by undefined local section (?)

readelf -s output/fs/btrfs/ctree.o   | grep "SECTION LOCAL" | grep UND
    60: 0000000000000000     0 SECTION LOCAL  DEFAULT  UND

The patch is actually very simple:

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 5104ba0504a5..8dd02c755206 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1458,6 +1459,7 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
                        *eb_ret = tmp;
                        return 0;
                }
+               btrfs_warn_rl(fs_info, "found not uptodate eb; btrfs_read_buffer failed; return EIO");
                free_extent_buffer(tmp);
                btrfs_release_path(p);
                return -EIO;
@@ -1485,8 +1487,10 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
                 * and give up so that our caller doesn't loop forever
                 * on our EAGAINs.
                 */
-               if (!extent_buffer_uptodate(tmp))
+               if (!extent_buffer_uptodate(tmp)) {
+                       btrfs_warn_rl(fs_info, "read_tree_block returned not-uptodate eb; set ret to EIO");
                        ret = -EIO;
+               }
                free_extent_buffer(tmp);
        } else {
                ret = PTR_ERR(tmp);

After digging into this, we found that create-diff-object detected changed functions that we didn't change in the patch, which is caused by __LINE__ in btrfs_abort_transaction. After matching line numbers, by removing empty lines, the issue was gone.

I have two questions from this case:

  1. Is there a better way to handle __LINE__ and changed line numbers?
  2. What's the connection between changed __LINE__ and the ld error above (ld: final link failed: Bad value)? I guess it is something with create-diff-object, but haven't figure out why.
jpoimboe commented 2 years ago
  1. Is there a better way to handle __LINE__ and changed line numbers?

Unfortunately the __LINE__ detection is a hack, which is pretty much unavoidable. However I think we can tweak it to work for btrfs_abort_transaction(). I can work up a patch.

  1. What's the connection between changed __LINE__ and the ld error above (ld: final link failed: Bad value)? I guess it is something with create-diff-object, but haven't figure out why.

I looked, and the UNDEF section symbol was for the __dyndbg section. It also seems related to btrfs_abort_transaction(). create-diff-object has some special handling for __dyndbg so maybe it's getting confused somehow, stripping the __dyndbg section but not the corresponding symbol. I'll need to stare at it some more.

joe-lawrence commented 2 years ago

I confirmed the same on rhel-9 dev build as well, reducing the test patch down to a nop() call in btrfs_abort_transaction() (had to enable the config option since it's disabled in rhel-9 now). I didn't have much more time to dig in before losing that test machine.

jpoimboe commented 2 years ago

I think the 2nd problem (undefined section symbol) is related to DYNAMIC_DEBUG_BRANCH() which is used indirectly by btrfs_abort_transaction(). It uses a static branch which references the dynamic debug descriptor, which lives in __dyndbg. But __dyndbg isn't included by create-diff-object, so the __jump_table entries have references to nowhere.

patched/fs/btrfs/ctree.o:

Relocation section [42] '.rela__jump_table' for section [41] '__jump_table' at offset 0x43f8 contains 24 entries:
  Offset              Type            Value               Addend Name
  000000000000000000  X86_64_PC32     000000000000000000    +729 .text.push_node_left
  0x0000000000000004  X86_64_PC32     000000000000000000    +733 .text.push_node_left
  0x0000000000000008  X86_64_PC64     000000000000000000    +154 __dyndbg
  0x0000000000000010  X86_64_PC32     000000000000000000    +661 .text.balance_node_right
  0x0000000000000014  X86_64_PC32     000000000000000000    +665 .text.balance_node_right
  0x0000000000000018  X86_64_PC64     000000000000000000     +98 __dyndbg
  0x0000000000000020  X86_64_PC32     000000000000000000    +792 .text.btrfs_copy_root
  0x0000000000000024  X86_64_PC32     000000000000000000    +796 .text.btrfs_copy_root
  0x0000000000000028  X86_64_PC64     000000000000000000    +378 __dyndbg
  0x0000000000000030  X86_64_PC32     000000000000000000   +1324 .text.__btrfs_cow_block
  0x0000000000000034  X86_64_PC32     000000000000000000   +1434 .text.__btrfs_cow_block
  0x0000000000000038  X86_64_PC64     000000000000000000    +322 __dyndbg
  0x0000000000000040  X86_64_PC32     000000000000000000   +1417 .text.__btrfs_cow_block
  0x0000000000000044  X86_64_PC32     000000000000000000     +31 .text.unlikely.__btrfs_cow_block
  0x0000000000000048  X86_64_PC64     000000000000000000    +266 __dyndbg
  0x0000000000000050  X86_64_PC32     000000000000000000   +1427 .text.__btrfs_cow_block
  0x0000000000000054  X86_64_PC32     000000000000000000      +0 .text.unlikely.__btrfs_cow_block
  0x0000000000000058  X86_64_PC64     000000000000000000    +210 __dyndbg
  0x0000000000000060  X86_64_PC32     000000000000000000    +252 .text.btrfs_cow_block
  0x0000000000000064  X86_64_PC32     000000000000000000    +272 .text.btrfs_cow_block
  0x0000000000000068  X86_64_PC64     000000000000000000     +10 __tracepoint_btrfs_cow_block
  0x0000000000000070  X86_64_PC32     000000000000000000    +916 .text.split_node
  0x0000000000000074  X86_64_PC32     000000000000000000    +920 .text.split_node
  0x0000000000000078  X86_64_PC64     000000000000000000     +42 __dyndbg

output/fs/btrfs/ctree.o:

Relocation section [15] '.rela__jump_table' for section [14] '__jump_table' at offset 0x998 contains 9 entries:
  Offset              Type            Value               Addend Name
  000000000000000000  X86_64_PC32     000000000000000000    +729 push_node_left
  0x0000000000000004  X86_64_PC32     000000000000000000    +733 push_node_left
  0x0000000000000008  X86_64_PC64     000000000000000000    +154
  0x0000000000000010  X86_64_PC32     000000000000000000    +661 balance_node_right
  0x0000000000000014  X86_64_PC32     000000000000000000    +665 balance_node_right
  0x0000000000000018  X86_64_PC64     000000000000000000     +98
  0x0000000000000020  X86_64_PC32     000000000000000000    +916 split_node
  0x0000000000000024  X86_64_PC32     000000000000000000    +920 split_node
  0x0000000000000028  X86_64_PC64     000000000000000000     +42

Notice the output object __jump_table relocations has some entries with empty symbols. Those are referring to the UNDEF __dyndbg section symbol, which is undefined because its corresponding __dyndbg section doesn't exist.

liu-song-6 commented 2 years ago

@jpoimboe and @joe-lawrence Thanks a lot for your kind help with this issue. Shall we include __dyndbg in create-diff-object? (If we don't, the match-line-number hack still works for our use cases).

jpoimboe commented 2 years ago

The quick fix for the __LINE__ detection is something like

diff --git a/kpatch-build/create-diff-object.c b/kpatch-build/create-diff-object.c
index 4014db8..809d020 100644
--- a/kpatch-build/create-diff-object.c
+++ b/kpatch-build/create-diff-object.c
@@ -670,7 +670,8 @@ static bool kpatch_line_macro_change_only_x86_64(struct section *sec)
                           (!strcmp(rela->sym->name, "___might_sleep")) ||
                           (!strcmp(rela->sym->name, "__might_fault")) ||
                           (!strcmp(rela->sym->name, "printk")) ||
-                          (!strcmp(rela->sym->name, "lockdep_rcu_suspicious"))) {
+                          (!strcmp(rela->sym->name, "lockdep_rcu_suspicious")) ||
+                          (strstr(rela->sym->name, "__func__"))) {
                                found = 1;
                                break;
                        }

But yes, we will also need to make sure the __dyndbg section gets included when it's referenced in a needed jump_table entry. I have some patches in progress, will need to clean them up and work up a unit test or two.