foniod / redbpf

Rust library for building and running BPF/eBPF modules
Apache License 2.0
1.71k stars 136 forks source link

Example `echo`: Data sent before sockmap is ready is only sent after second message #281

Open gfokkema opened 2 years ago

gfokkema commented 2 years ago

To reproduce the issue with a contrived example:

gerlof@host:redbpf $ cargo build --features llvm13
gerlof@host:redbpf $ sudo target/debug/examples/echo 10000
gerlof@host:redbpf $ (sleep .0015 ; echo test) | nc -N 127.0.0.1 10000
test
gerlof@host:redbpf $ (sleep .0015 ; echo test) | nc -N 127.0.0.1 10000
test
gerlof@host:redbpf $ (sleep .0015 ; echo test) | nc -N 127.0.0.1 10000
gerlof@host:redbpf $ (sleep .0015 ; echo test) | nc -N 127.0.0.1 10000

The output of echo for a succesfull result, followed by a failed result:

new client: 127.0.0.1:33392, fd: 15
ipv4: 7f 0 0 1 port: 0 0 82 70
delete client: 127.0.0.1:33392 fd: 15
new client: 127.0.0.1:33396, fd: 15
ipv4: 7f 0 0 1 port: 0 0 82 74
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Map', examples/example-userspace/examples/echo.rs:115:51
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
delete client: 127.0.0.1:33396 fd: 15
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: SendError(Delete { key: IdxMapKey { addr: 16777343, port: 1954676736 } })', examples/example-userspace/examples/echo.rs:70:60

Keeping the connection open and sending a second message when the first reply was not sent, causes the first and second reply to be both be sent simultaneously:

gerlof@host:redbpf $ (sleep .0015 ; echo test ; sleep 1 ; echo a) | nc -N 127.0.0.1 10000
test    # <-- received immediately
a       # <-- received after 1 second
gerlof@host:redbpf $ (sleep .0015 ; echo test ; sleep 1 ; echo a) | nc -N 127.0.0.1 10000
test    # <-- received after 1 second
a       # <-- received after 1 second
gerlof@host:redbpf $ (sleep .0015 ; echo test ; sleep 1 ; echo a) | nc -N 127.0.0.1 10000
test    # <-- received after 1 second
a       # <-- received after 1 second

Using bpftool to trace the program output, i can verify that #[sendmap_parser] is not called for the first packet in the problematic cases:

gerlof@host:redbpf $ (sleep .0015 ; echo test) | nc -N 127.0.0.1 10000
gerlof@host:redbpf $ sudo bpftool prog tracelog
...

and otherwise:

gerlof@host:redbpf $ (sleep .0015 ; echo test) | nc -N 127.0.0.1 10000
test
gerlof@host:redbpf $ sudo bpftool prog tracelog
              nc-365193  [004] d.s51 34676.996449: bpf_trace_printk: length: 05 00 00 00

              nc-365193  [004] d.s51 34676.996467: bpf_trace_printk: ip as BE: 7f 00 00 01

              nc-365193  [004] d.s51 34676.996469: bpf_trace_printk: port as BE: 00 00 82 90

Likewise when keeping the connection open, when receiving a delayed first reply together with the second reply, there is only one trace entries:

gerlof@host:redbpf $ (sleep .0015 ; echo test; sleep 1; echo a) | nc -N 127.0.0.1 10000
test   # <-- received after 1 second
a      # <-- received after 1 second
gerlof@host:redbpf $ sudo bpftool prog tracelog
              nc-365248  [004] d.s51 34809.267120: bpf_trace_printk: length: 07 00 00 00

              nc-365248  [004] d.s51 34809.267142: bpf_trace_printk: ip as BE: 7f 00 00 01

              nc-365248  [004] d.s51 34809.267143: bpf_trace_printk: port as BE: 00 00 82 9a

and otherwise, when receiving an instant reply, followed by the second reply, there are two trace entries:

gerlof@host:redbpf $ (sleep .0015 ; echo test; sleep 1; echo a) | nc -N 127.0.0.1 10000
test   # <-- received immediately
a      # <-- received after 1 second
gerlof@host:redbpf $ sudo bpftool prog tracelog
              nc-365256  [004] d.s51 34811.620006: bpf_trace_printk: length: 05 00 00 00

              nc-365256  [004] d.s51 34811.620024: bpf_trace_printk: ip as BE: 7f 00 00 01

              nc-365256  [004] d.s51 34811.620025: bpf_trace_printk: port as BE: 00 00 82 9c

              nc-365256  [004] d.s51 34812.621976: bpf_trace_printk: length: 02 00 00 00

              nc-365256  [004] d.s51 34812.621998: bpf_trace_printk: ip as BE: 7f 00 00 01

              nc-365256  [004] d.s51 34812.622000: bpf_trace_printk: port as BE: 00 00 82 9c

Unfortunately I'm only taking a first look at redbpf I'm in no way qualified to tell what's going on here. Reducing sleep or removing it alltogether behaves as expected and makes echo fail all the time.

A less contrived example with real world use cases which indeed removes that sleep is curl ;)

rhdxmr commented 2 years ago

Thank you for letting me know of this mysterious symptom. At first glance I can't tell what's going on.. I didn't look into this yet. By the way, what nc -N means? My nc does not support -N option so I don't understand.

gfokkema commented 2 years ago

Hi there!

nc -N shuts down the network socket after EOF on stdin (bsd netcat vs gnu netcat or smth).

After the echo example has missed a reply, as soon as netcat closes the fd, the EOF and subsequent socket close causes a second message that does trigger the reply, causing the echo example to write to the already closed fd, leading to a panic (as was noted in the example too).

The panic is thus to be expected really, and the curious behaviour is why sometimes the first trigger to process the packet is missed until the second trigger comes along (either EOF or a different message such as a).

Ie, this intermittent behaviour:

gerlof@host:redbpf $ (sleep .0015 ; echo test; sleep 1; echo a) | nc 127.0.0.1 10000
test   # <-- received immediately
a      # <-- received after 1 second
gerlof@host:redbpf $ (sleep .0015 ; echo test; sleep 1; echo a) | nc 127.0.0.1 10000
test   # <-- received after 1 second
a      # <-- received after 1 second
gerlof@host:redbpf $ (sleep .0015 ; echo test; sleep 1; echo a) | nc 127.0.0.1 10000
test   # <-- received immediately
a      # <-- received after 1 second

Thank you for your response!

rhdxmr commented 2 years ago

@gfokkema

I've investigated this issue recently and fix some part of the problem. But I am still looking for a solution for that stream parser is not triggered before the second packet is received.

Solved issues

  1. The problem that echo program dies if nc sends packets right after the connection is established without delay.
  2. The problem that nc does not get response from the echo server if nc closes connection in half right after the packet is sent without delay.

Root cause of above problems

  1. Updating sockmap with a new socket descriptor fails if some epoll events of the socket occurs before. To avoid this situation, change the program to update sockmap right after the accept call finished.
  2. Updating sockmap with a new socket descriptor also fails if the connection is already half closed by peer which is nc. To avoid this situation, as a fallback, userspace program reads data from the socket and conducts echo-ing in person.

Unsolved problem

It seems that the data that is already received and stored at the TCP backlog before setting sockmap does not trigger the stream parser. That causes the first packet sent before setting sockmap does not lead to the echo response.

I am wondering there is a way to trigger stream parser with the existing data in the backlog when updating sockmap.

rhdxmr commented 2 years ago

I forgot to attach the link of the PR that solves problems mentioned above. Here it is:

I'll let you know if I find a solution for applying stream parser for the first packet that is received before sockmap setting.

rhdxmr commented 2 years ago

@gfokkema

I found a workaround to trigger stream parser manually.

Calling setsockopt(SO_RCVLOWAT) right after setting sockmap solves the problem. It makes the kernel check whether tcp received data is ready so that the packets already received before setting sockmap are processed immediately.

This PR makes the first packet is echoed instantly. I hope this change solves the problem you had struggled with. Thanks,

gfokkema commented 2 years ago

Thank you very much for your efforts, very much appreciated!

My apologies for the late reply (student life), I'll make sure to soon find some time, test your improvements and post an update here.