containernetworking / plugins

Some reference and example networking plugins, maintained by the CNI team.
Apache License 2.0
2.23k stars 789 forks source link

fix(dhcp): can not renew an ip address #1092

Closed lisongmin closed 1 month ago

lisongmin commented 2 months ago

The dhcp server is systemd-networkd, and the dhcp plugin can request an ip but can not renew it. The systemd-networkd just ignore the renew request.

2024/09/14 21:46:00 no DHCP packet received within 10s
2024/09/14 21:46:00 retrying in 31.529038 seconds
2024/09/14 21:46:42 no DHCP packet received within 10s
2024/09/14 21:46:42 retrying in 63.150490 seconds
2024/09/14 21:47:45 98184616c91f15419f5cacd012697f85afaa2daeb5d3233e28b0ec21589fb45a/iot/eth1: no more tries
2024/09/14 21:47:45 98184616c91f15419f5cacd012697f85afaa2daeb5d3233e28b0ec21589fb45a/iot/eth1: renewal time expired, rebinding
2024/09/14 21:47:45 Link "eth1" down. Attempting to set up
2024/09/14 21:47:45 98184616c91f15419f5cacd012697f85afaa2daeb5d3233e28b0ec21589fb45a/iot/eth1: lease rebound, expiration is 2024-09-14 22:47:45.309270751 +0800 CST m=+11730.048516519

Follow the https://datatracker.ietf.org/doc/html/rfc2131#section-4.3.6, following options must not be sent in renew

And renew should using unicast but not broadcast.

lisongmin commented 2 months ago

I don't know how to set to using unicast address, but it just work

14:33:08.610940 IP (tos 0x0, ttl 16, id 28928, offset 0, flags [none], proto UDP (17), length 366)
    0.0.0.0.68 > 255.255.255.255.67: [no cksum] BOOTP/DHCP, Request from ce:18:d4:e1:99:4c, length 338, xid 0x4c6a0e05, Flags [none] (0x0000)
          Client-IP 192.168.4.143
          Client-Ethernet-Address ce:18:d4:e1:99:4c
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message (53), length 1: Request
            Client-ID (61), length 74: "98184616c91f15419f5cacd012697f85afaa2daeb5d3233e28b0ec21589fb45a/iot/eth1"
            Hostname (12), length 13: "homeassistant"
            Parameter-Request (55), length 1:
              Subnet-Mask (1)
            END (255), length 0
14:33:08.611137 IP (tos 0xc0, ttl 64, id 60414, offset 0, flags [DF], proto UDP (17), length 311)
    192.168.4.1.67 > 192.168.4.143.68: [bad udp cksum 0x8b15 -> 0x844f!] BOOTP/DHCP, Reply, length 283, xid 0x4c6a0e05, Flags [none] (0x0000)
          Your-IP 192.168.4.143
          Client-Ethernet-Address ce:18:d4:e1:99:4c
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message (53), length 1: ACK
            Lease-Time (51), length 4: 60
            Subnet-Mask (1), length 4: 255.255.255.0
            Default-Gateway (3), length 4: 192.168.4.1
            Domain-Name-Server (6), length 4: 192.168.4.1
            TZ-Name (101), length 7: "Etc/UTC"
            Server-ID (54), length 4: 192.168.4.1
            END (255), length 0
squeed commented 2 months ago

We can't just edit vendor files; they need to exactly match upstream. If our library has a bug, we should ideally fix that or find a new library.

lisongmin commented 2 months ago

We can't just edit vendor files; they need to exactly match upstream. If our library has a bug, we should ideally fix that or find a new library.

Oh, I didn't realize that was vendor code.

Since the upstream code has been inactive for 6 years, I think we should switch to https://github.com/insomniacslk/dhcp.

What do you think? Any other good ideas?

squeed commented 2 months ago

@lisongmin sounds good! If it's a difficult replacement, let me know and we can consider forking. But in general, it's better to switch to a supported upstream :-).

lisongmin commented 2 months ago

Main changed in this PR:

  1. replace d2g/dhcp4client with insomniacslk/dhcp
  2. Using dnsmasq as the dhcp server in test (replace d2g/dhcp4server)
  3. Reduce the dhcp test time from (169s -> 72s) by adding --resendtimeout daemon option, the resend timeout represent the total duration we should wait for acquire/renew to complete, and if it is expire, no more retries.
  4. We can now cancel the backoff retries on CmdDel, so we can stop quickly on retries.
lisongmin commented 2 months ago

It seems there is a test error, but not relative to this PR:

• [FAILED] [0.011 seconds]
sbr test [It] Works with multiple IPs
/home/runner/work/plugins/plugins/plugins/meta/sbr/sbr_linux_test.go:406
  [FAILED] Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/plugins/plugins/plugins/meta/sbr/sbr_linux_test.go:522 @ 09/20/24 03:58:32.946
LionelJouin commented 2 months ago

It seems there is a test error, but not relative to this PR:

• [FAILED] [0.011 seconds]
sbr test [It] Works with multiple IPs
/home/runner/work/plugins/plugins/plugins/meta/sbr/sbr_linux_test.go:406
  [FAILED] Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/plugins/plugins/plugins/meta/sbr/sbr_linux_test.go:522 @ 09/20/24 03:58:32.946

I looked into it, I will try to fix it soon. An issue has been created: https://github.com/containernetworking/plugins/issues/1096

lisongmin commented 1 month ago

I have rebased to the latest code, feel free to have a look at it, thanks.

squeed commented 1 month ago

@lisongmin looks good! Just needs a minor rebase, then we can get this in.

lisongmin commented 1 month ago

Great to hear that. And the rebase is done.