eunomia-bpf / bpftime

Userspace eBPF runtime for fast Uprobe & Syscall hook & Extensions with LLVM JIT
https://eunomia.dev/bpftime/
MIT License
699 stars 70 forks source link

[BUG] Hot looping in bpftime's epoll_wait emulation #186

Closed agentzh closed 6 months ago

agentzh commented 6 months ago

I've found that bpftime's epoll_wait emulation might hot loop when no events arrive. The top shows the tracer process (or the bpftime server process's child process) is almost always using 100% CPU:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 644087 agentzh   20   0   32640  10028   9600 R 100.0   0.0   1:50.25 ebpf_loader

perf top shows the hottest functions are all in libbpftime-syscall-server.so:

+   68.52%     0.52%  libbpftime-syscall-server.so  [.] ring_buffer__poll
+   67.28%     1.55%  libbpftime-syscall-server.so  [.] epoll_wait
+   60.51%     1.15%  libbpftime-syscall-server.so  [.] syscall_context::handle_epoll_wait
+   35.74%     5.84%  libc-2.31.so                  [.] __GI___sigprocmask

Can we add some sleep in between? This is too wasteful when the events do not arrive immediately.

agentzh commented 6 months ago

BTW, I only registered a uprobe in this example.

agentzh commented 6 months ago

And when hot looping happens, there is no target processes with the bpftime agent library started yet.

I'm not sure if it makes any difference after the target processes start (I haven't tested this case).

agentzh commented 6 months ago

OK, I found that after the target process with the bpftime agent started, and also there's a fatal error like this:

[2024-01-27 23:05:03.707] [info] [agent.cpp:66] Entering bpftime agent
[2024-01-27 23:05:03.708] [info] [bpftime_shm_internal.cpp:618] Global shm constructed. shm_open_type 1 for bpftime_maps_shm
[2024-01-27 23:05:03.708] [info] [agent.cpp:81] Initializing agent..
[2024-01-27 23:05:03][error][644325] Failed to load insn: invalid call immediate at PC 17
[2024-01-27 23:05:03][error][644325] Failed to initialize attach context
[2024-01-27 23:05:03.712] [info] [agent.cpp:66] Entering bpftime agent
[2024-01-27 23:05:03.712] [info] [bpftime_shm_internal.cpp:618] Global shm constructed. shm_open_type 1 for bpftime_maps_shm
[2024-01-27 23:05:03.712] [info] [agent.cpp:81] Initializing agent..
[2024-01-27 23:05:03][error][644325] Failed to load insn: invalid call immediate at PC 17
[2024-01-27 23:05:03][error][644325] Failed to initialize attach context
INFO [644325]: Global shm destructed

The server process then enters the epoll wait hot loop forever also, just like the case above.

agentzh commented 6 months ago

OK, it seems like this is because bpftime's epoll emulation fails to implement the -1 timeout argument correctly. In this case, it just returns immediately without checking any events. And this is why the ringbuf is never consumed by the server or tracer.

Zheaoli commented 6 months ago

/assign