containers / netavark

Container network stack
Apache License 2.0
528 stars 83 forks source link

flake in dhcp-proxy test: parser ran out of data-- not enough byte #618

Closed Luap99 closed 3 months ago

Luap99 commented 1 year ago

new flake:

not ok 2 basic setup
# (from function `die' in file test-dhcp/helpers.bash, line 108,
#  from function `run_helper' in file test-dhcp/helpers.bash, line 84,
#  from function `run_in_container_netns' in file test-dhcp/helpers.bash, line 97,
#  from function `run_client' in file test-dhcp/helpers.bash, line 411,
#  from function `run_setup' in file test-dhcp/helpers.bash, line 395,
#  in test file test-dhcp/002-setup.bats, line 23)
#   `run_setup "$input_config"' failed
# ### Setup ###
#  ip netns exec zUqqLi2u8n brctl addbr br0
#  ip netns exec zUqqLi2u8n ifconfig br0 10.131.131.1 up
#  ip netns exec zUqqLi2u8n firewall-cmd --add-interface=br0 --zone=trusted
# success
#  ip netns exec zUqqLi2u8n ip link add veth0br type veth peer name veth0
#  ip netns exec zUqqLi2u8n brctl addif br0 veth0br
#  ip netns exec zUqqLi2u8n ip link set veth0br up
#  ip netns exec zUqqLi2u8n ip link set veth0 up
#  ip netns exec zUqqLi2u8n ip -j link show veth0
# [{"ifindex":3,"link":"veth0br","ifname":"veth0","flags":["BROADCAST","MULTICAST","UP","LOWER_UP"],"mtu":1500,"qdisc":"noqueue","operstate":"UP","linkmode":"DEFAULT","group":"default","txqlen":1000,"link_type":"ether","address":"46:61:cc:fb:60:86","broadcast":"ff:ff:ff:ff:ff:ff"}]
#  ip netns exec zUqqLi2u8n ip link add veth1br type veth peer name veth1
#  ip netns exec zUqqLi2u8n brctl addif br0 veth1br
#  ip netns exec zUqqLi2u8n ip link set veth1br up
#  ip netns exec zUqqLi2u8n ip link set veth1 up
#  ip netns exec zUqqLi2u8n ip link set lo up
#  ip netns exec zUqqLi2u8n dnsmasq --log-debug --log-queries --conf-dir /tmp/nv-proxygdm/dnsmasq -x /tmp/nv-proxygdm/dns.PID
#  ip netns exec zUqqLi2u8n ./bin/netavark-dhcp-proxy-client --uds /tmp/nv-proxygdm/nv-proxy.sock -f /tmp/nv-proxygdm/setup.json setup foo
# [ERROR mozim::msg] InvalidDhcpServerReply: Failed to parse DHCP message from payload of pkg []: parser ran out of data-- not enough bytes
# Error: InvalidDhcpServerReply: Failed to parse DHCP message from payload of pkg []: parser ran out of data-- not enough bytes
# Error: status: NotFound, message: "InvalidDhcpServerReply: Failed to parse DHCP message from payload of pkg []: parser ran out of data-- not enough bytes", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Fri, 10 Mar 2023 12:31:41 GMT", "content-length": "0"} }
# [ rc=6 (** EXPECTED 0 **) ]
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #| FAIL: exit code is 6; expected 0
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# ### Teardown ###
#  ip netns exec zUqqLi2u8n ip link set veth0br down
#  ip netns exec zUqqLi2u8n ip link set veth0 down
# /usr/lib/bats-core/test_functions.bash: line 260: 15025 Killed                  ip netns exec "$NS_NAME" ./bin/netavark-dhcp-proxy --dir "$TMP_TESTDIR" --uds "$TMP_TESTDIR"
#  ip netns exec zUqqLi2u8n brctl delif br0 veth0br
#  ip netns exec zUqqLi2u8n ip link del veth0br type veth peer name veth0
#  ip netns exec zUqqLi2u8n firewall-cmd --remove-interface=br0 --zone=trusted
# success
#  ip netns exec zUqqLi2u8n ip link set br0 down
#  ip netns exec zUqqLi2u8n brctl delbr br0
#  cat /tmp/nv-proxygdm/dns.PID
# 15042
#  ip netns exec zUqqLi2u8n ip link set lo down

https://cirrus-ci.com/task/5066529801240576

agorgl commented 4 months ago

I run into this issue pretty often with some pods that use a macvlan network:

Jun 21 04:28:07 test netavark[4665]: [ERROR mozim::msg] InvalidDhcpServerReply: Failed to parse DHCP message from payload of pkg []: parser ran out of data-- not enough bytes
Jun 21 04:28:07 test kernel: virtio_net virtio1 eth0: left promiscuous mode
Jun 21 04:28:07 test iot-sample[6614]: time="2024-06-21T04:28:07Z" level=error msg="Starting some container dependencies"
Jun 21 04:28:07 test iot-sample[6614]: time="2024-06-21T04:28:07Z" level=error msg="\"netavark: unable to obtain lease: dhcp proxy error: status: Internal, message: \\\"InvalidDhcpServerReply: Failed to parse DHCP message from payload of pkg []: parser ran out of data-- not enough bytes\\\", details: [], metadata: MetadataMap { headers: {\\\"content-type\\\": \\\"application/grpc\\\", \\\"date\\\": \\\"Fri, 21 Jun 2024 04:28:07 GMT\\\", \\\"content-length\\\": \\\"0\\\"} }\""
Jun 21 04:28:07 test iot-sample[6614]: Error: starting some containers: internal libpod error
Jun 21 04:28:07 test systemd[1]: iot-sample.service: Main process exited, code=exited, status=125/n/a
Jun 21 04:28:07 test systemd[1]: iot-sample.service: Killing process 6652 (conmon) with signal SIGKILL.

Even with latest netavark version:

$ podman version
Client:       Podman Engine
Version:      5.1.1
API Version:  5.1.1
Go Version:   go1.22.3
Built:        Tue Jun  4 00:00:00 2024
OS/Arch:      linux/amd64

$ podman info
...
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.11.0-1.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.11.0
    package: netavark-1.11.0-1.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.11.0
...

DHCP Server is dnsmasq in all cases as far as I know

Luap99 commented 4 months ago

Are you able to setup package captures when that happens? It would be good to know if we actually get the dhcp packages in time. I would think so given it flakes in our CI and we tests only local so the chances for package loss are very low. I guess it is some bug in the DHCP client parser but I don't really have time to debug that at the moment unfortunately.

agorgl commented 3 months ago

I got a minimal POC on this.

Steps to reproduce:

  1. Start a vanilla VM with latest fedora
  2. Install podman with sudo dnf install podman
  3. Enable dhcp proxy with sudo systemctl enable --now netavark-dhcp-proxy.socket
  4. Create the following quadlet files:
    # /etc/containers/systemd/macvlan.network
    [Network]
    NetworkName=macvlan
    IPAMDriver=dhcp
    Driver=macvlan
    Options=parent=eth0
# /etc/containers/systemd/sample.container
[Unit]
Description=A sample container

[Container]
ContainerName=sample
Image=docker.io/library/nginx:latest
Network=macvlan.network
PublishPort=8080:80

[Install]
WantedBy=multi-user.target default.target
  1. Generate service files with sudo systemctl daemon-reload
  2. Start service with sudo systemctl start sample
  3. As soon as the service starts (2-5 seconds for the above example) immediately restart it with sudo systemctl restart sample

Service restart should fail and logs show the bellow error:

Jun 30 08:14:25 test sample[2363]: Error: netavark: unable to obtain lease: dhcp proxy error: status: Internal, message: "InvalidDhcpServerReply: Failed to parse DHCP message from payload of pkg []: parser ran out of data-- not enough bytes", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Sun, 30 Jun 2024 08:14:25 GMT", "content-length": "0"} }

Issue can be reproduced easily from now on with quick subsequent service restarts using sudo systemctl restart sample; sleep 2; sudo systemctl restart sample (maybe also wait a little before trying this since the last service start)

agorgl commented 3 months ago

Some further observations on this after spending half a day on it:

  1. For some reason the problem is extremely difficult to reproduce with packet capturing (tcpdump) enabled on the host that netavark dhcp proxy runs on (libvirt guest vm), but pretty frequent without it, although it takes some tries to bump on it
  2. After reproducing the case multiple times and capturing the dhcp packets from the libvirt's virbr0 bridge on the vm host instead, I am able to see that for some reason we get the error immediately after netavark sends the dhcp discover packet, without waiting for a dhcp offer packet at all. A sample run is this:

image

Here, the green section is a successful dhcp dora process after a sudo systemctl restart sample. Immediately after that, doing a second sudo systemctl restart sample generates a new dhcp discover packet (highlighted in red) and immediately the error happens, even though that a few seconds after that (the same amount of seconds as the successful case) the relevant dhcp offer packet (highlighted in grey) arrives.

The error message can be traced down on the mozim dhcp client library here: https://github.com/nispor/mozim/blob/v0.2.3/src/msg.rs#L212

Due to the empty payload in the error message, this could originate from this recv buffer https://github.com/nispor/mozim/blob/v0.2.3/src/socket.rs#L128 being passed empty to the relevant dhcp decoder above

agorgl commented 3 months ago

After enabling debug logs for netavark-dhcp-proxy service the error can be traced in mozim:

Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::event] Adding timer 8 seconds with event Timeout to event pool
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::time] TimerFd created TimerFd { fd: OwnedFd { fd: 13 } } with 8 seconds
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::event] Adding fd 13 to Epoll 12, event Timeout
Jul 01 08:31:59 test audit: ANOM_PROMISCUOUS dev=eth0 prom=256 old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
Jul 01 08:31:59 test kernel: virtio_net virtio1 eth0: entered promiscuous mode
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::bpf] Registering BPF filter 0x28, 0, 0, 0x0000000c
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::bpf] Registering BPF filter 0x15, 0, 8, 0x00000800
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::bpf] Registering BPF filter 0x30, 0, 0, 0x00000017
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::bpf] Registering BPF filter 0x15, 0, 6, 0x00000011
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::bpf] Registering BPF filter 0x28, 0, 0, 0x00000014
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::bpf] Registering BPF filter 0x45, 4, 0, 0x00001fff
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::bpf] Registering BPF filter 0xb1, 0, 0, 0x0000000e
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::bpf] Registering BPF filter 0x48, 0, 0, 0x00000010
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::bpf] Registering BPF filter 0x15, 0, 1, 0x00000044
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::bpf] Registering BPF filter 0x06, 0, 0, 0xffffffff
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::bpf] Registering BPF filter 0x06, 0, 0, 0x00000000
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::socket] Raw socket created 14
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::event] Adding socket 14 with event RawPackageIn to event pool
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::event] Adding fd 14 to Epoll 12, event RawPackageIn
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::event] Adding timer 3 seconds with event DiscoveryTimeout to event pool
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::time] TimerFd created TimerFd { fd: OwnedFd { fd: 15 } } with 3 seconds
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::event] Adding fd 15 to Epoll 12, event DiscoveryTimeout
Jul 01 08:31:59 test audit: ANOM_PROMISCUOUS dev=eth0 prom=0 old_prom=256 auid=4294967295 uid=0 gid=0 ses=4294967295
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::msg] DHCP message Message { opcode: BootRequest, htype: Eth, hlen: 6, hops: 0, xid: 1836246904, secs: 0, flags: Flags { broadcast: false }, ciaddr: 0.>
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::socket] Sending raw ethernet package: [255, 255, 255, 255, 255, 255, 62, 98, 49, 211, 49, 89, 8, 0, 69, 0, 1, 25, 0, 0, 64, 0, 128, 17, 249, 212, 0, 0>
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::socket] Raw socket sent: 295 bytes
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::client] Processing event RawPackageIn
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::socket] Raw socket receiving
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::socket] Raw socket received [51, 51, 0, 0, 0, 22, 62, 98, 49, 211, 49, 89, 134, 221, 96, 0, 0, 0, 0, 36, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, >
Jul 01 08:31:59 test netavark[2968]: [ERROR mozim::msg] InvalidDhcpServerReply: Failed to parse DHCP message from payload of pkg []: parser ran out of data-- not enough bytes
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::event] Removing fd 15 from Epoll 12
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::event] Removing fd 13 from Epoll 12
Jul 01 08:31:59 test netavark[2968]: [DEBUG mozim::event] Removing fd 14 from Epoll 12

That I suppose comes from here: https://github.com/nispor/mozim/blob/v0.2.3/src/msg.rs#L209 https://github.com/nispor/mozim/blob/v0.2.3/src/msg.rs#L298 https://github.com/nispor/mozim/blob/v0.2.3/src/client.rs#L514

Luap99 commented 3 months ago

Thanks for the details, I don't have much time to run this down currently. Are you interested in working on a fix?

agorgl commented 3 months ago

I'm still tracing it, I'll try to make a fix if I find the problem

agorgl commented 3 months ago

After analyzing the payloads in the full mozim debug logs:

Jul 01 10:27:08 test netavark[3577]: [DEBUG mozim::msg] DHCP message Message { opcode: BootRequest, htype: Eth, hlen: 6, hops: 0, xid: 492450829, secs: 0, flags: Flags { broadcast: false }, ciaddr: 0.0.0.0, yiaddr: 0.0.0.0, siaddr: 0.0.0.0, giaddr: 0.0.0.0, chaddr: [118, 255, 172, 65, 53, 143, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], sname: None, fname: None, magic: [99, 130, 83, 99], opts: DhcpOptions({ClientIdentifier: ClientIdentifier([]), MessageType: MessageType(Discover), ParameterRequestList: ParameterRequestList([Hostname, SubnetMask, Router, DomainNameServer, DomainName, InterfaceMtu, NTPServers])}) }
Jul 01 10:27:08 test netavark[3577]: [DEBUG mozim::socket] Sending raw ethernet package: [255, 255, 255, 255, 255, 255, 118, 255, 172, 65, 53, 143, 8, 0, 69, 0, 1, 25, 0, 0, 64, 0, 128, 17, 249, 212, 0, 0, 0, 0, 255, 255, 255, 255, 0, 68, 0, 67, 1, 5, 55, 190, 1, 1, 6, 0, 29, 90, 52, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 118, 255, 172, 65, 53, 143, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 99, 130, 83, 99, 53, 1, 1, 55, 7, 12, 1, 3, 6, 15, 26, 42, 255]
Jul 01 10:27:08 test netavark[3577]: [DEBUG mozim::socket] Raw socket sent: 295 bytes
Jul 01 10:27:08 test netavark[3577]: [DEBUG mozim::client] Processing event RawPackageIn
Jul 01 10:27:08 test netavark[3577]: [DEBUG mozim::socket] Raw socket receiving
Jul 01 10:27:08 test netavark[3577]: [DEBUG mozim::socket] Raw socket received [51, 51, 0, 0, 0, 22, 118, 255, 172, 65, 53, 143, 134, 221, 96, 0, 0, 0, 0, 36, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 58, 0, 5, 2, 0, 0, 1, 0, 143, 0, 57, 186, 0, 0, 0, 1, 4, 0, 0, 0, 255, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 255, 65, 53, 143]
Jul 01 10:27:08 test netavark[3577]: [ERROR mozim::msg] InvalidDhcpServerReply: Failed to parse DHCP message from payload of pkg []: parser ran out of data-- not enough bytes

I converted the packet that the raw socket received starting with [51, 51... to a hex dump, compared with nearby packets in the sequence and it seems to be an IPv6 Multicast Listener Report Message v2!

In the following capture image

The crash happens after netavark/mozim sends packet highlighted in red, because it expects packet highlighted in green, but recv's first packet highlighted in yellow instead

I suppose the fix should be in mozim here: https://github.com/nispor/mozim/blob/v0.2.3/src/client.rs#L512 to not process the first packet it receives, but to inspect and check if it is a valid dhcp packet and skip it if it isn't

Luap99 commented 3 months ago

I suppose the fix should be in mozim here: https://github.com/nispor/mozim/blob/v0.2.3/src/client.rs#L512 to not process the first packet it receives, but to inspect and check if it is a valid dhcp packet and skip it if it isn't

That sounds logical

agorgl commented 3 months ago

Created a PR in mozim that should fix the above: https://github.com/nispor/mozim/pull/33 As soon as it passes, I'll ask for a release to bump the dep here too and fix this

cathay4t commented 3 months ago

mozim 0.2.4 published. cargo update should do the magic to pull it in.

Luap99 commented 3 months ago

@cathay4t Thank you!

thecubic commented 3 months ago

I'm getting this behavior in a prod situation - the way it appears is that containers just won't start with that same error message. I have macvlan dhcp networks to a couple of real VLANs (also dnsmasq)

I also noticed that the dhcp-proxy process starts using an inappropriate amount of CPU after some period of time, and when it's in that state it will fail all requests consistently - if I kill the proxy then most (not all) containers will start again

I can consistently repro at least a couple of failures when I nuke and repave all of my containers on that box. I'll build and test in the next few days to see if this fixes it

thanks @agorgl!

thecubic commented 3 months ago

I'm running rust-nightly release flavor now. I'm definitely suffering from #811 as there were 13708 threads created to manage ~15 container leases over 3 days.

thecubic commented 3 months ago

I was able to restart all of my containers without any crashes after this patch (for the first time in netavark), so thanks again @agorgl!

I also have 1/10th the threads (~1300) after twice the time