Open lmb opened 2 years ago
I think this is the same issue as https://lore.kernel.org/bpf/CAN+4W8i=7Wv2VwvWZGhX_mc8E7EST10X_Z5XGBmq=WckusG_fw@mail.gmail.com/
clang generates BTF with duplicates in it, libbpf deduplicates the BTF but doesn't update the instruction immediate.
target_type_id=67->16253
means we expect to find type id67
inInstruction.Constant
. Instead we findinvalid immediate 73
.We can dump the BTF to figure out what types this refers to:
We think the instruction should reference a
struct bpf_insn
but the compiler encodedstruct btf_enum
. After adding some advanced printf debugging:From the last line we know that applying the fixup at offset 302 fails. The BTF relocation
(from wire)
encodes target type 67 for that offset, which bpftool agrees isstruct bpf_insn
. As soon as we write it into metadata we can see that the instruction at offset 302 disagrees, the immediate is 73. We can verify that the73
comes from the ELF, not from a bug in the library:At this point it looks like the encoded instructions and the encoded CO-RE relocations disagree. Maybe this is related to which tool we're using to do the linking? An interesting experiment would be if @ti-mo rebuilt the 5.15 selftests on his machine, and then run the unit test against that. If that changes the outcome its a tooling issue, if not it might be a bug in 5.17.
Another thing to try is running the same example via libbpf and looking at the output it generates for this particular test case.
Diff for my quick hack:
Originally posted by @lmb in https://github.com/cilium/ebpf/issues/668#issuecomment-1123942443