cilium / ebpf

ebpf-go is a pure-Go library to read, modify and load eBPF programs and attach them to various hooks in the Linux kernel.
https://ebpf-go.dev
MIT License
6.33k stars 697 forks source link

btf: libbpf doesn't update instructions when deduplicating types #739

Open lmb opened 2 years ago

lmb commented 2 years ago
--- FAIL: TestLibBPFCompat (0.01s)
    --- FAIL: TestLibBPFCompat/netif_receive_skb.linked3.o (0.01s)
        elf_reader_test.go:665: Error during loading: program trace_netif_receive_skb: apply CO-RE relocations: apply fixup target_type_id=67->16253: invalid immediate 73, expected 67 (fixup: target_type_id=67->16253)

target_type_id=67->16253 means we expect to find type id 67 in Instruction.Constant. Instead we find invalid immediate 73.

We can dump the BTF to figure out what types this refers to:

$ bpftool btf dump file netif_receive_skb.linked3.o format raw | grep -F '[67]'
[67] STRUCT 'bpf_insn' size=8 vlen=5
$ bpftool btf dump file netif_receive_skb.linked3.o format raw | grep -F '[73]'
[73] STRUCT 'btf_enum' size=8 vlen=2

We think the instruction should reference a struct bpf_insn but the compiler encoded struct btf_enum. After adding some advanced printf debugging:

$ ./run-tests.sh 5.17 -run TestLibBPFCompat/netif_receive_skb.linked3.o . | grep -A1 -B1 302
offset 18 target_type_id 19 (from wire)
offset 302 target_type_id 67 (from wire)
offset 359 target_type_id 2 (from wire)
--
offset 18 target_type_id of Struct:"sk_buff" (0) local type id 19 (from ext info)
offset 302 target_type_id of Struct:"bpf_insn" (0) local type id 67 (from ext info)
offset 359 target_type_id of Int:"int" (0) local type id 2 (from ext info)
--
offset 18 LdImmDW dst: r2 imm: 19 target_type_id of Struct:"sk_buff" (0) (into metadata)
offset 302 LdImmDW dst: r3 imm: 73 target_type_id of Struct:"bpf_insn" (0) (into metadata)
offset 359 LdImmDW dst: r3 imm: 2 target_type_id of Int:"int" (0) (into metadata)
--
offset 18 LdImmDW dst: r2 imm: 19 target_type_id of Struct:"sk_buff" (0) (from metadata)
offset 302 LdImmDW dst: r3 imm: 73 target_type_id of Struct:"bpf_insn" (0) (from metadata)
offset 359 LdImmDW dst: r3 imm: 2 target_type_id of Int:"int" (0) (from metadata)
--
offset 4258 LdImmDW dst: r3 imm: 83 target_type_id of Struct:"__sk_buff" (0) (from metadata)
offset 302 LdImmDW dst: r3 imm: 73 target_type_id of Struct:"bpf_insn" (0) (apply)
--- FAIL: TestLibBPFCompat (0.02s)

From the last line we know that applying the fixup at offset 302 fails. The BTF relocation (from wire) encodes target type 67 for that offset, which bpftool agrees is struct bpf_insn. As soon as we write it into metadata we can see that the instruction at offset 302 disagrees, the immediate is 73. We can verify that the 73 comes from the ELF, not from a bug in the library:

$ llvm-objdump-12 --section=tp_btf/netif_receive_skb -D netif_receive_skb.linked3.o  | grep ' 302:'     302:    18 03 00 00 49 00 00 00 00 00 00 00 00 00 00 00 r3 = 73 ll

At this point it looks like the encoded instructions and the encoded CO-RE relocations disagree. Maybe this is related to which tool we're using to do the linking? An interesting experiment would be if @ti-mo rebuilt the 5.15 selftests on his machine, and then run the unit test against that. If that changes the outcome its a tooling issue, if not it might be a bug in 5.17.

Another thing to try is running the same example via libbpf and looking at the output it generates for this particular test case.

Diff for my quick hack:

diff --git a/internal/btf/ext_info.go b/internal/btf/ext_info.go
index 2c0e1af..e1c9996 100644
--- a/internal/btf/ext_info.go
+++ b/internal/btf/ext_info.go
@@ -124,6 +124,7 @@ func (ei *ExtInfos) Assign(insns asm.Instructions, section string) {
        }

        if len(reloInfos) > 0 && reloInfos[0].offset == iter.Offset {
+           fmt.Println("offset", iter.Offset, iter.Ins, reloInfos[0].relo, "(into metadata)")
            iter.Ins.Metadata.Set(coreRelocationMeta{}, reloInfos[0].relo)
            reloInfos = reloInfos[1:]
        }
@@ -610,6 +611,10 @@ type CORERelocation struct {
    kind     coreKind
 }

+func (cr *CORERelocation) String() string {
+   return fmt.Sprintf("%s of %s (%s)", cr.kind, cr.typ, cr.accessor)
+}
+
 func CORERelocationMetadata(ins *asm.Instruction) *CORERelocation {
    relo, _ := ins.Metadata.Get(coreRelocationMeta{}).(*CORERelocation)
    return relo
@@ -653,6 +658,7 @@ func newRelocationInfos(brs []bpfCORERelo, ts types, strings *stringTable) ([]co
        if err != nil {
            return nil, fmt.Errorf("offset %d: %w", br.InsnOff, err)
        }
+       fmt.Println("offset", relo.offset, relo.relo, "local type id", br.TypeID, "(from ext info)")
        rs = append(rs, *relo)
    }
    sort.Slice(rs, func(i, j int) bool {
@@ -713,7 +719,7 @@ func parseCOREReloRecords(r io.Reader, bo binary.ByteOrder, recordSize uint32, r
        // ELF tracks offset in bytes, the kernel expects raw BPF instructions.
        // Convert as early as possible.
        relo.InsnOff /= asm.InstructionSize
-
+       fmt.Println("offset", relo.InsnOff, relo.Kind, relo.TypeID, "(from wire)")
        out = append(out, relo)
    }

diff --git a/linker.go b/linker.go
index 60cb7a6..723077c 100644
--- a/linker.go
+++ b/linker.go
@@ -117,11 +117,14 @@ func findReferences(progs map[string]*ProgramSpec) error {
 func applyRelocations(insns asm.Instructions, local, target *btf.Spec) error {
    var relos []*btf.CORERelocation
    var reloInsns []*asm.Instruction
+   var reloOffsets []asm.RawInstructionOffset
    iter := insns.Iterate()
    for iter.Next() {
        if relo := btf.CORERelocationMetadata(iter.Ins); relo != nil {
+           fmt.Println("offset", iter.Offset, iter.Ins, relo, "(from metadata)")
            relos = append(relos, relo)
            reloInsns = append(reloInsns, iter.Ins)
+           reloOffsets = append(reloOffsets, iter.Offset)
        }
    }

@@ -144,6 +147,7 @@ func applyRelocations(insns asm.Instructions, local, target *btf.Spec) error {

    for i, fixup := range fixups {
        if err := fixup.Apply(reloInsns[i]); err != nil {
+           fmt.Println("offset", reloOffsets[i], reloInsns[i], relos[i], "(apply)")
            return fmt.Errorf("apply fixup %s: %w", &fixup, err)
        }
    }

Originally posted by @lmb in https://github.com/cilium/ebpf/issues/668#issuecomment-1123942443

lmb commented 1 year ago

I think this is the same issue as https://lore.kernel.org/bpf/CAN+4W8i=7Wv2VwvWZGhX_mc8E7EST10X_Z5XGBmq=WckusG_fw@mail.gmail.com/

clang generates BTF with duplicates in it, libbpf deduplicates the BTF but doesn't update the instruction immediate.