foss-for-synopsys-dwc-arc-processors / linux

Helpful resources for users & developers of Linux kernel for ARC
22 stars 13 forks source link

[eBPF] bootstrap test does not print all the relevant data #159

Closed shahab-vahedi closed 5 months ago

shahab-vahedi commented 5 months ago

Original post here.

Events are passed through ring buffer with missing information:

# ./bootstrap
TIME     EVENT COMM             PID     PPID    FILENAME/EXIT CODE
01:03:23 EXEC                   201     170
01:07:12 EXEC                   202     108
01:07:14 EXEC                   204     202
01:07:14 EXEC                   204     202

The problem is in the definition of event structure in bootstrap.h:

struct event {
        int pid;
        int ppid;
        unsigned exit_code;
        unsigned long long duration_ns;
        char comm[TASK_COMM_LEN];
        char filename[MAX_FILENAME_LEN];
        bool exit_event;
};

All fields after long long field are read incorrectly. Moving duration_ns field to the end of the structure allows to obtain data correctly:

# ./bootstrap
TIME     EVENT COMM             PID     PPID    FILENAME/EXIT CODE
01:13:28 EXEC  ls               236     170     /bin/ls
01:13:29 EXIT  ls               236     170     [0] (144617218213ms)

However, incorrect data is read from duration_ns. Here is output for the host machine:

17:53:19 EXEC  sh               2297257 574857  /bin/sh
17:53:19 EXEC  grep             2297259 2297257 /usr/bin/grep
17:53:19 EXEC  ls               2297258 2297257 /usr/bin/ls
17:53:19 EXIT  grep             2297259 2297257 [0] (7ms)
17:53:19 EXIT  ls               2297258 2297257 [2] (5ms)
17:53:19 EXIT  sh               2297257 574857  [0] (9ms)

I suppose that this issue is somehow related to reading/writing 64-bit types in eBPF.

shahab-vahedi commented 5 months ago

The simplified example: ring_buffer-bug.tar.gz

Instead of printing something like:

TIME     EVENT COMM             PID     PPID    FILENAME/EXIT CODE
00:01:18 EXEC  e                1       2       f
00:01:19 EXEC  e                1       2       f
00:01:19 EXEC  e                1       2       f
00:01:19 EXIT  x                10      20      [30] (1234ms)
00:01:42 EXEC  e                1       2       f
00:01:42 EXIT  x                10      20      [30] (1234ms)
00:01:44 EXIT  x                10      20      [30] (1234ms)
00:01:44 EXIT  x                10      20      [30] (1234ms)

It prints

TIME     EVENT COMM             PID     PPID    FILENAME/EXIT CODE
00:01:18 EXEC                   1       2
00:01:19 EXEC                   1       2
00:01:19 EXEC                   1       2
00:01:19 EXEC                   10      20
00:01:42 EXEC                   1       2
00:01:42 EXEC                   10      20
00:01:44 EXEC                   10      20
00:01:44 EXEC                   10      20

When moving the long long duration to the end of the struct event, it almost look correct but the duration itself:

TIME     EVENT COMM             PID     PPID    FILENAME/EXIT CODE
00:01:18 EXEC  e                1       2       f
00:01:19 EXEC  e                1       2       f
00:01:19 EXEC  e                1       2       f
00:01:19 EXIT  x                10      20      [30] (5299989643264ms)
00:01:42 EXEC  e                1       2       f
00:01:42 EXIT  x                10      20      [30] (5299989643264ms)
00:01:44 EXIT  x                10      20      [30] (5299989643264ms)
00:01:44 EXIT  x                10      20      [30] (5299989643264ms)
shahab-vahedi commented 5 months ago

TL;DR;

clang aligns long long at an 8-byte address, while gcc does not. Use __attribute__((packed)).

Diving

The data structure is

struct event {
    int       pid;
    int       ppid;
    unsigned  exit_code;
    long long duration;
    ...
};

gcc compiles the user space program (ring.c). It considers the duration field at offset 12 of the structure.

handle_event()
{
    ...
        if (e->duration)
   11f0e:       13fc b002               ld      r2,[fp,-4]
   11f12:       120c 018c               ldd     r12r13,[r2,12]
    ...
}

On the other hand, clang, the compiler for the bpf side, assumes it is at offset 16:

      12:       b7 01 00 00 d2 04 00 00 r1 = 1234
      13:       7b 10 10 00 00 00 00 00 *(u64 *)(r0 + 16) = r1

Therefore, although the data mapped from the kernel space to user space is the same, the interpretation diverges:

eBPF address space
------------------
0x70046218:     0x0a    0x00    0x00    0x00    0x14    0x00    0x00    0x00
0x70046220:     0x1e    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x70046228:     0xd2    0x04    0x00    0x00    0x00    0x00    0x00    0x00
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                duration as considered by clang

0x70046218 struct event {
0x70046218     int       pid       = 10,
0x7004621c     int       ppid      = 20
0x70046220     unsigned  exit_code = 30,
0x70046228     long long duration  = 1234,
...
           }

user address space
------------------
0x2013c218:     0x0a    0x00    0x00    0x00    0x14    0x00    0x00    0x00
0x2013c220:     0x1e    0x00    0x00    0x00    0x00    0x00    0x00    0x00
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                                duration as considered by gcc
0x2013c228:     0xd2    0x04    0x00    0x00    0x00    0x00    0x00    0x00
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

0x2013c218 struct event {
0x2013c218     int       pid       = 10,
0x2013c21c     int       ppid      = 20,
0x2013c220     unsigned  exit_code = 30,
0x2013c224     long long duration  = 0x04d2_0000_0000 (5299989643264)
...
           }

If the __attribute__((packed)) is used for the struct event, then the problem goes away:

struct __attribute__((packed, aligned(4))) event {
...
};

# ./ring
TIME     EVENT COMM             PID     PPID    FILENAME/EXIT CODE
00:01:15 EXEC  4                138     110     /dev/fd/4
00:01:16 EXEC  sh               139     138     /bin/sh
00:01:16 EXEC  id               140     139     /usr/bin/id
00:01:16 EXIT  id               140     139     [0] (23ms)
00:01:19 EXEC  time             141     139     /usr/bin/time
00:01:20 EXIT  time             141     139     [1] (44ms)
00:01:21 EXEC  date             142     139     /bin/date
00:01:21 EXIT  date             142     139     [0] (37ms)
00:01:22 EXEC  ls               143     139     /bin/ls
00:01:22 EXIT  ls               143     139     [0] (29ms)
00:01:38 EXIT  sh               139     138     [127] (21794ms)
00:01:38 EXIT  dropbear         138     110     [0] (22468ms)

Thanks @claziss for the hint about __attribute__((packed...)). @kolerov please confirm and close this issue.

kolerov commented 5 months ago

I confirm that packing fixes the issue. I updated the corresponding example:

https://github.com/foss-for-synopsys-dwc-arc-processors/arc-bpf-testbench/commit/58b90a1ce5e2ec65405e3516e2cd52528bbfdae5