NetworkConfiguration / dhcpcd

DHCP / IPv4LL / IPv6RA / DHCPv6 client.
https://roy.marples.name/projects/dhcpcd
BSD 2-Clause "Simplified" License
343 stars 111 forks source link

issues with reconfiguration of delegated prefixes when moving from v7.1.0 to v9.4.1/v10.0.2 #249

Closed Marty-42 closed 1 year ago

Marty-42 commented 1 year ago

I have the following setup:

A: behavior with v7.1.0-2+b1b (Debian Bullseye):

  1. Initial prefix delegation works in case of upstream prefix change:
  2. a reconfiguration request is received from upstream router
  3. respective sub-prefixes are updated and allocated to connected interfaces

B: behavior with v9.4.1-22 (Debian Bookworm), also observed with v10.0.2-4 (Debian Trixie) with the same config:

  1. Initial prefix delegation works
  2. reconfiguration request is not received from upstream router

Apparently the default configuration has changed between the two versions as adding "option dhcp6_reconfigure_accept" to the config file partly resolves the issue, i.e:

  1. reconfiguration request is received from upstream router, however only a 62bit prefix is delegated instead of the requested 60bit prefix:

dhcpcd[91080]: wan: unauthenticated RECONFIGURE6 from fe80::YYY dhcpcd[91080]: wan: RECONFIGURE6 from fe80::YYY dhcpcd[91080]: wan: broadcasting RENEW6 (xid 0x9a0dde), next in 10.3 seconds dhcpcd[91080]: wan: DHCPv6 REPLY: prefix mismatch .... dhcpcd[91080]: wan: ADV 2003:XXX::/62 from fe80::YYY

  1. updated and allocation of sub-prefixes to connected interfaces partly fails, as the provided 62bit prefix is not sufficient to allocated all prefixes for the connected devices:

dhcpcd[91080]: eth9: invalid prefix 2003:XXX::/62 + 9/64: Numerical result out of range

I have currently a workaround in place that simply restarts dhcpcd whenever the wrong prefix length is received during a reconfiguration, but I would rather avoid such workaround, in particular as the setup flawlessly worked with v7.1.0.

Questions: Is there any other change of the config required in order to receive a prefix with the proper prefix length (it always works when starting the server and with version v7.1.0 it also works flawlessly during reconfiguration) or is this some bug?

Additionally, the "unauthenticated RECONFIGURE6" warning is unrelated as it also appeared with version v7.1.0 and should be simply ignored due to the "noauthrequired". I still would like to get rid of this warning, however the documentation is somewhat lean in this respect. Anyone willing to share some insight how to properly setup authentication?

Below is the initial dhcpcd.conf (before adding "option dhcp6_reconfigure_accept"):

debug hostname duid persistent option rapid_commit option interface_mtu require dhcp_server_identifier require dhcp6_server_id nohook resolv.conf hostname ntp-common.conf chrony.conf timesyncd.conf ntp.conf openntpd.conf ipv6only waitip 6 release noauthrequired noipv6rs

interface wan ipv6rs iaid 1 ia_pd 1/::/60 eth1/1/64/1 eth2/2/64/1 eth5/5/64/1 eth7/7/64/1 eth9/9/64/1

Thanks!

rsmarples commented 1 year ago

Firstly, DHCP authentication is entirely controlled at the server level. You have to configure the server secrets at the client side. This is why we have the noauthrequired option. It breaks RFC compliance which is why we are noisy about it. RECONFIGURE requires authentication and is generally only sent when something has changed, it should not be used to notify the client to RENEW periodically as T1/T2 timers are used for that.

No matter if you request a 56 or a 60, if the server offers you a 62 then that's what you get. The requested prefix delegation length is just a hint to the server, nothing more.

Testing againt the Kea DHCPv6 server I can replicate this issue by only delegating a larger prefix_length than was asked for. But dhcpcd is doing the right thing as you are wanting to go against the DHCP servers wishes.

If you inspect the DHCP transactions using tcpdump or wireshark you should be able to see this. You can also use dhcpcd -U6 to dump the DHCPv6 lease to the console as environment variables. The prefix lengths are decoded using a generic method whereas the address extraction in the code is different - if all is well you should see you were leased the "wrong" prefix length.

Marty-42 commented 1 year ago

Thanks for the prompt response.

I understand that the server's response can deviate from the client's request. However, when starting the dhcpcd always a prefix with the requested length is delegated. Moreover, with version 7.1.0 even during RECONFIGURE the requested prefix length is obtained (reproducible, I just need to downgrade the client, no changes of server, never observed a prefix mismatch).

Since the default config between the versions has changed (option dhcp6_reconfigure_accept is now explicitly required, whereas with 7.1.0 it seems to be part of the default settings) I have the suspicion that other default settings may have changed that influence the server's response. However, scanning through the commit messages I couldn't find an obvious candidate.

Marty-42 commented 1 year ago

I compared the traffic triggered by the reconfigure via tcpdump. The incoming/outgoing reconfigure/renew/reply packets do not really differ, however the then following solicit responses differ:

7.1.0:
IP6 (flowlabel 0xbf5de, hlim 1, next-header UDP (17) payload length: 175) fe80::XXXX.546 > ff02::1:2.547: [bad udp cksum 0x8bc6 -> 0xae81!] dhcp6 solicit (xid=38cd4f (client-ID hwaddr/time type 1 time 747829313 406231071b16) (elapsed-time 0) (vendor-class) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2003:YYYY::/60 pltime:3600 vltime:7200)) (Client-FQDN) (reconfigure-accept) (option-request server-ID reconfigure-accept opt_82 opt_83))

9.4.1 IP6 (flowlabel 0x927dc, hlim 1, next-header UDP (17) payload length: 146) fe80::XXXX.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=abf88b (client-ID hwaddr/time type 1 time 747829313 406231071b16) (IA_PD IAID:1 T1:0 T2:0) (option-request server-ID reconfigure-accept opt_82 opt_83) (elapsed-time 0) (vendor-class) (Client-FQDN) (reconfigure-accept))

The " IA_PD-prefix..." part is missing int the response by the 9.4.1 version.

The then following advertise packets form the upstream router contain either a 60 bit prefix or a 62 bit prefix depending on the version, i.e it seems that the different solicit responses by dhcpcd are causing the different length of the delegated prefixes.

rsmarples commented 1 year ago

That looks like a bug with dhcpcd-9.4.1 for sure. How does the capture look like with dhcpcd-10.0.2?

Marty-42 commented 1 year ago

It looks essentially the same. Here are the first 5 packets (reconfigure/renew/reply/solicit/advertise):

IP6 (hlim 64, next-header UDP (17) payload length: 65) fe80::XXXX.547 > fe80::YYYY.546: [udp sum ok] dhcp6 reconfigure (xid=0 (client-ID hwaddr/time type 1 time 747829313 406231071b16) (server-ID hwaddr type 1 2c3afd8432fb) (reconfigure-accept) (option-request IA_PD opt_82 opt_83 opt_67) (reconfigure-message for renew))

IP6 (flowlabel 0x927dc, hlim 1, next-header UDP (17) payload length: 190) fe80::YYYY.546 > ff02::1:2.547: [udp sum ok] dhcp6 renew (xid=7431e1 (client-ID hwaddr/time type 1 time 747829313 406231071b16) (server-ID hwaddr type 1 2c3afd8432fb) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2003:UUUU::/60 pltime:0 vltime:0)) (option-request server-ID reconfigure-accept opt_82 opt_83) (elapsed-time 0) (vendor-class) (Client-FQDN) (reconfigure-accept))

IP6 (hlim 64, next-header UDP (17) payload length: 82) fe80::XXXX.547 > fe80::YYYY.546: [udp sum ok] dhcp6 reply (xid=7431e1 (opt_82) (opt_83) (client-ID hwaddr/time type 1 time 747829313 406231071b16) (server-ID hwaddr type 1 2c3afd8432fb) (status-code NoBinding))

IP6 (flowlabel 0x927dc, hlim 1, next-header UDP (17) payload length: 147) fe80::YYYY.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=48f6e4 (client-ID hwaddr/time type 1 time 747829313 406231071b16) (IA_PD IAID:1 T1:0 T2:0) (option-request server-ID reconfigure-accept opt_82 opt_83) (elapsed-time 0) (vendor-class) (Client-FQDN) (reconfigure-accept))

IP6 (hlim 64, next-header UDP (17) payload length: 154) fe80::XXXX.547 > fe80::YYYY.546: [udp sum ok] dhcp6 advertise (xid=48f6e4 (opt_82) (opt_83) (client-ID hwaddr/time type 1 time 747829313 406231071b16) (server-ID hwaddr type 1 2c3afd8432fb) (preference 0) (reconfigure-accept) (DNS-server fd00::XXXX) (opt_86) (IA_PD IAID:1 T1:1800 T2:2880 (IA_PD-prefix 2003:ZZZZ::/62 pltime:3600 vltime:7200)))

The renew hast the IA-PD-prefix information as payload, the solicit is missing it.

Marty-42 commented 1 year ago

....and just to confirm the obvious (as a fresh start always works as expected), the solicit packet which is send out when dcpcd is started contains the IA_PD-prefix information, i.e. the information is only missing if a reconfiguration is triggered

rsmarples commented 1 year ago

It looks essentially the same. Here are the first 5 packets (reconfigure/renew/reply/solicit/advertise):

IP6 (hlim 64, next-header UDP (17) payload length: 65) fe80::XXXX.547 > fe80::YYYY.546: [udp sum ok] dhcp6 reconfigure (xid=0 (client-ID hwaddr/time type 1 time 747829313 406231071b16) (server-ID hwaddr type 1 2c3afd8432fb) (reconfigure-accept) (option-request IA_PD opt_82 opt_83 opt_67) (reconfigure-message for renew))

IP6 (flowlabel 0x927dc, hlim 1, next-header UDP (17) payload length: 190) fe80::YYYY.546 > ff02::1:2.547: [udp sum ok] dhcp6 renew (xid=7431e1 (client-ID hwaddr/time type 1 time 747829313 406231071b16) (server-ID hwaddr type 1 2c3afd8432fb) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2003:UUUU::/60 pltime:0 vltime:0)) (option-request server-ID reconfigure-accept opt_82 opt_83) (elapsed-time 0) (vendor-class) (Client-FQDN) (reconfigure-accept))

IP6 (hlim 64, next-header UDP (17) payload length: 82) fe80::XXXX.547 > fe80::YYYY.546: [udp sum ok] dhcp6 reply (xid=7431e1 (opt_82) (opt_83) (client-ID hwaddr/time type 1 time 747829313 406231071b16) (server-ID hwaddr type 1 2c3afd8432fb) (status-code NoBinding))

OK, up until now it looks fine. However, the server has rejected the renew request with the no binding code.

IP6 (flowlabel 0x927dc, hlim 1, next-header UDP (17) payload length: 147) fe80::YYYY.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=48f6e4 (client-ID hwaddr/time type 1 time 747829313 406231071b16) (IA_PD IAID:1 T1:0 T2:0) (option-request server-ID reconfigure-accept opt_82 opt_83) (elapsed-time 0) (vendor-class) (Client-FQDN) (reconfigure-accept))

And here we solicit without a prefix length hint.

IP6 (hlim 64, next-header UDP (17) payload length: 154) fe80::XXXX.547 > fe80::YYYY.546: [udp sum ok] dhcp6 advertise (xid=48f6e4 (opt_82) (opt_83) (client-ID hwaddr/time type 1 time 747829313 406231071b16) (server-ID hwaddr type 1 2c3afd8432fb) (preference 0) (reconfigure-accept) (DNS-server fd00::XXXX) (opt_86) (IA_PD IAID:1 T1:1800 T2:2880 (IA_PD-prefix 2003:ZZZZ::/62 pltime:3600 vltime:7200)))

The renew hast the IA-PD-prefix information as payload, the solicit is missing it.

Yes, exactly. And you say the initial solicit before reconfigure has the correct prefix length?

Marty-42 commented 1 year ago

Yes, it does, cf below the packet exchange when dhcpcd is started (no apparent difference across the versions):

IP6 (flowlabel 0x927dc, hlim 1, next-header UDP (17) payload length: 176) fe80::XXXX.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=15cc (client-ID hwaddr/time type 1 time 747829313 406231071b16) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix ::/60 pltime:0 vltime:0)) (option-request server-ID reconfigure-accept opt_82 opt_83) (elapsed-time 0) (vendor-class) (Client-FQDN) (reconfigure-accept))

IP6 (hlim 64, next-header UDP (17) payload length: 170) fe80::YYYY.547 > fe80::XXXX.546: [udp sum ok] dhcp6 advertise (xid=15cc (opt_82) (opt_83) (client-ID hwaddr/time type 1 time 747829313 406231071b16) (server-ID hwaddr type 1 2c3afd8432fb) (preference 0) (reconfigure-accept) (DNS-server fd00::YYYY 2003:ZZZZ:5800:YYYY) (opt_86) (IA_PD IAID:1 T1:1800 T2:2880 (IA_PD-prefix 2003:ZZZZ:58f0::/60 pltime:3600 vltime:7200)))

IP6 (flowlabel 0x927dc, hlim 1, next-header UDP (17) payload length: 190) fe80::XXXX.546 > ff02::1:2.547: [udp sum ok] dhcp6 request (xid=7e2c5 (client-ID hwaddr/time type 1 time 747829313 406231071b16) (server-ID hwaddr type 1 2c3afd8432fb) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2003:ZZZZ:58f0::/60 pltime:0 vltime:0)) (option-request server-ID reconfigure-accept opt_82 opt_83) (elapsed-time 0) (vendor-class) (Client-FQDN) (reconfigure-accept))

IP6 (hlim 64, next-header UDP (17) payload length: 170) fe80::YYYY.547 > fe80::XXXX.546: [udp sum ok] dhcp6 reply (xid=7e2c5 (opt_82) (opt_83) (client-ID hwaddr/time type 1 time 747829313 406231071b16) (server-ID hwaddr type 1 2c3afd8432fb) (preference 0) (reconfigure-accept) (DNS-server fd00::YYYY 2003:ZZZZ:5800:YYYY) (opt_86) (IA_PD IAID:1 T1:1800 T2:2880 (IA_PD-prefix 2003:ZZZZ:58f0::/60 pltime:3600 vltime:7200)))

In summary

Marty-42 commented 1 year ago

another observation during my testing, probably not related, but the reason why dhcpcd --rebind is not resolving the issue but a full restart is needed: According to the man page the config is reread in that case, but this does apparently not mean that the configuration will be "freshly" applied. What I observe is the following: If a prefix with the "wrong" prefix was delegated and the config file remained unchanged (i.e. the config file asks for a 60bit prefix, but a 62bit prefix was delegated), then the existing 62bit prefix will be send in the rebind message to the server. If however the config file has changed in the meantime asking e.g. for a 59bit prefix, then this will be properly reflected in the rebind message to the server. That means changes in the config file will be addressed but differences between the actual "live" configuration and an unchanged config file will not. Wouldn't it be preferable that calling dhcpcd --rebind with an unchanged config file would result in state that resembles the state after a restart? Just some thought...

rsmarples commented 1 year ago

OK, this is happening because of this message: dhcpcd[91080]: wan: DHCPv6 REPLY: prefix mismatch dhcpcd is marking the prefix as stale and thus not re-requesting the configuration as set.

I think I have enough information now to try and provide a fix.

rsmarples commented 1 year ago

@Marty-42 can you test this branch please? Hopefully it fixes your isssue. I can't easily test right now.

Marty-42 commented 1 year ago

Works! Thanks for the prompt fix :-)