Open ThomasZeman opened 2 years ago
RFC 4861 6.3.7 states how this should happen and it makes no mention of what to do when all routers have expired.
I suppose we can set a re-solicit timer based on the current reachable preferred router lifetime minus (MAX_RTR_SOLICITATIONS RTR_SOLICITATION_INTERVAL) + (MAX_RTR_SOLICITATION_DELAY 2) + randomisation.
I think I actually observe this too on FreeBSD. I had these weird issues with loss of connectivity and traced it back to restarting dhcpcd fixing it. (I use dhcpcd solely for IPv6) Maybe ISP specific, a possible option for dhcpcd.conf?
@ThomasZeman @driesmp can you please test the above commit to see if this fixes your problem?
Thanks for looking into this! I will give it a try as soon as possible.
As it turns out my Internet Provider (Dodo Internet, Australia) stopped routing IPv6 in the meantime and answered my Customer Service ticket with: we never supported IPv6. I suppose it was working earlier because someone did some IPv6 testing which might also be the root cause for the missing automatic router advertisement messages. In a nutshell: I am unable to test this patch at the moment and with a new ISP they might send unsolicited router advertisements anyhow. However, I still see great value in this code change and would propose to merge it nonetheless or at least keep it open for a while. Sorry, that I cannot offer more at the moment.
After all I think it might not be dhcpcd related, https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261129 (atleast the problem I was observing, as I was actually not using dhcpcd as RS, only the DHCPv6 portion). I think maybe its better to revert the commit, as indeed, the router should send out periodic RA's.
EDIT: after switching to RS from dhcpcd my setup keeps working fine.
A compliant router should send out RA's in a timely fashion.
I think I've been hitting that issue at FOSDEM today and yesterday because the network wasn't always reliable and I suspect RAs were getting lost. I didn't do a network capture unfortunately. Resolvers were being removed but I often still had an ssh session alive and using dhcpcd --renew typically gave me a working setup within a couple seconds at most (basically the time to switch windows and test). I think that if I waited for long enough it sometimes "fixed itself" but I was in that situation at most thrice.
Hi,
My ISP LWLcom (Germany) has the same problem. They do not send periodic router advertisements. So after 1800 secs, dhcpcd removes the default route and ipv6 is gone.
As workaround I implemented the following: I setup a cronjob to execute dhcpd -N
every 20 minutes. This refreshes all addresses and to do that send a solicit message.
Would it be possible to still allow any of the following?
The workaround works, but from my experience also with development of Dnsmasq, sometimes it is a good idea to actively send explicit solicit messages, e.g. if connection if weak.
Another (cheaper) workaround is to use the "rdisc6" tool to just send the sollicit messages via cronjob onto the wire. The response is then (also) interpreted by dhcpcd:
root@sirius:/etc/cron.d# cat rdisc-wan
# workaround to renew router advertisement
*/20 * * * * root /usr/bin/perl -e 'sleep int(rand(30))' && /usr/bin/rdisc6 -1nq ifwan >/dev/null
(this sends a sollicit every 20 minutes, which fits perfectly).
If you check debug output of dhcpcd, it works as expected:
Apr 6 00:20:01 sirius CRON[8342]: (root) CMD (/usr/bin/perl -e 'sleep int(rand(30))' && /usr/bin/rdisc6 -1nq ifwan >/dev/null)
Apr 6 00:20:10 sirius dhcpcd[7497]: ifwan: Router Advertisement from fe80::e2f6:2dff:fe92:75f3
Apr 6 00:20:10 sirius dhcpcd[7497]: ifwan: executing `/lib/dhcpcd/dhcpcd-run-hooks' ROUTERADVERT
So I really hope this gets at some point fixed, because if you check the forums about routers this seems a common problem with many ISP providers. Somebody already forked dhcpcd because of this: https://github.com/harrykipper/dhcpcd/pull/1
can you please test the above commit to see if this fixes your problem?
@rsmarples I’ve tested your patch and it works great! :·)
A compliant router should send out RA's in a timely fashion.
Well, this is your opinion and while I agree that a well-mannered router should send out unsolicited RAs, the RFC doesn’t require them to – responding to RS’ is sufficient, as ThomasZeman has already pointed out in the very first post in this bugreport.
Routers send out Router Advertisement messages periodically, or in response to Router Solicitations.
RFC doesn’t require them to – responding to RS’ is sufficient, as ThomasZeman has already pointed out in the very first post in this bugreport.
Ah, but the RFC does require them to be sent periodically.
6.2.2. Becoming an Advertising Interface The term "advertising interface" refers to any functioning and enabled interface that has at least one unicast IP address assigned to it and whose corresponding AdvSendAdvertisements flag is TRUE. A router MUST NOT send Router Advertisements out any interface that is not an advertising interface.
6.2.1. Router Configuration Variables AdvSendAdvertisements A flag indicating whether or not the router sends periodic Router Advertisements and responds to Router Solicitations.
MaxRtrAdvInterval The maximum time allowed between sending unsolicited multicast Router Advertisements from the interface, in seconds. MUST be no less than 4 seconds and no greater than 1800 seconds.
MinRtrAdvInterval The minimum time allowed between sending unsolicited multicast Router Advertisements from the interface, in seconds. MUST be no less than 3 seconds and no greater than .75 * MaxRtrAdvInterval.
The RFC does not allow a zero value for MinRtrAdvInterval or MaxRtrAdvInterval. AdvSendAdvertisements says periodic RA and respond to RS - not either or.
So no. I stand by my assement that a compliant router MUST send periodic router adveristments.
That's fine. Unfortunately there are routers out there (especially with DSL lines), that do not send RA periodically.
I think your patch that was reverted is working perfectly. I agree if you don't want it enabled by default. If that's the main reason, why not add a config option to enable active router solicitation?
In the meantime the alternative workaround I described before (using rdisc6 tool in a cronjob to send solicitation) works for me. But I really ask to add the functionality back. It would help very much for uncompliant routers.
Thanks.
Because that would then break this part of 6.3.7. Sending Router Solicitations
Router Solicitations may be sent after any of the following events:
- The interface is initialized at system startup time.
- The interface is reinitialized after a temporary interface
failure or after being temporarily disabled by system
management.
- The system changes from being a router to being a host, by
having its IP forwarding capability turned off by system
management.
- The host attaches to a link for the first time.
- The host re-attaches to a link after being detached for some
time.
Once the host sends a Router Solicitation, and receives a valid Router Advertisement with a non-zero Router Lifetime, the host MUST desist from sending additional solicitations on that interface, until the next time one of the above events occurs. Moreover, a host SHOULD send at least one solicitation in the case where an advertisement is received prior to having sent a solicitation. Responses to solicited advertisements may contain more information than unsolicited advertisements.
I'm loath to add a toggle that causes dhcpcd to be RFC incompliant. Saying that, dhcpcd does break for you guys so I'll think about it.
Hm, you’re right, re-reviewing the RFC and the parts you quote, your interpretation does make more sense.
I’ve done a little bit of research “in the wild.” Linux behaves the same way as dhcpcd, i.e., when the router expires, IPv6 connectivity is lost and that’s it. Listening with wireshark on the network, I see quite a few of (repeated) RS packets. Some sent at around half (or even less) of the router lifetime, but many few seconds after the router lifetime expiring, which I interpret as reactions to losing connectivity (LineageOS seems to be one of those doing this).
So, on one hand, the routers are non-compliant with the protocol, on the other hand, they exist and there’s a question what can be done about that. We’ve seen many many times workarounds built to make things work even in the presence of broken hw or sw. I’m not happy about that and turn such workarounds off whenever I can, but at the same time I’m glad the workarounds are implemented when I find myself in a situation like the one we discuss here, where I don’t have many options for dealing with the problem.
Hi, I was also reading the spec parts regarding the following:
Once the host sends a Router Solicitation, and receives a valid Router Advertisement with a non-zero Router Lifetime, the host MUST desist from sending additional solicitations on that interface, until the next time one of the above events occurs. Moreover, a host SHOULD send at least one solicitation in the case where an advertisement is received prior to having sent a solicitation. Responses to solicited advertisements may contain more information than unsolicited advertisements.
I agree, the spec says that that it should not send any more solicitations. The idea here is to only send new router solicitations (like on startup), if it is "almost clear" that no new advertisements come in. If that happens, the IPv6 route would get lost; I interpret this as one of the stae changes as above. In that case dhcpcd should send RS again because interface changed state (no route anymore).
@xHire discovered the linux kernel seems to do the same. But @xHire also discovered that LineageOS send RS at half of the lifetime. I think this is fine to do because it prevents the connections from going down.
When should the solicitations be sent? Let's look at: AdvDefaultLifetime:
AdvDefaultLifetime The value to be placed in the Router Lifetime field of Router Advertisements sent from the interface, in seconds. MUST be either zero or between MaxRtrAdvInterval and 9000 seconds. A value of zero indicates that the router is not to be used as a default router. These limits may be overridden by specific documents that describe how IPv6 operates over different link layers. For instance, in a point-to-point link the peers may have enough information about the number and status of devices at the other end so that advertisements are needed less frequently. Default: 3 * MaxRtrAdvInterval
So if the lifetime is 1800 (verified for my router on the other end of the DSL link), the rcommendation is to send an unsolicited advertisement every 600 seconds. The can also be seen in the defaults:
MaxRtrAdvInterval The maximum time allowed between sending unsolicited multicast Router Advertisements from the interface, in seconds. MUST be no less than 4 seconds and no greater than 1800 seconds. Default: 600 seconds
So to me it is fine if the client (dhcpcd) sends a few RS (with random delay like on startup) AFTER earliest 1/3 orf the router lifetime. So starting to send them after half of the time looks fine to me. Your patch does it shortly before the lifetime ends, which is perfectly fine. We are just preventing the connection to get lost. If it would get lost we would need to send new solicitiations anyway, so doing it shortly before is perfectly fine.
So my conculsion is: You may interpret the patch / my proposal to be against standard, but as a last fallback - really shortly before lifetime ends - is a good chance to do a final try to renew it. Maybe somebody should add that to a newer version of the standard?
The patch b93f080edc438e2d43f24b1f3b73dc7d003c2628 implements a last chance to renew the router. Under normal circumstances this should never happen, because a periodic advertisements is send by default every 1/3 of the lifetime.
I'd like to have the commit back. To be safe, I will build a version of dhcpcd with the patch enabled and test it now.
Hi, I checked out dhcpcd 10.0.2 with the above patch cherry-picked by hash and built it for Debian and checked: Works fine! Approx 10 seconds before the lifetime of router end, it solicits:
Jul 22 00:38:45 sirius dhcpcd[594540]: ifwan: delaying IPv6 router solicitation for 1.0 seconds
Jul 22 00:38:46 sirius dhcpcd[594540]: ifwan: soliciting an IPv6 router
Jul 22 00:38:46 sirius dhcpcd[594540]: ifwan: sending Router Solicitation
Jul 22 00:38:46 sirius dhcpcd[594540]: ifwan: Router Advertisement from fe80::e2f6:2dff:fe92:75f3
Jul 22 00:38:46 sirius dhcpcd[594540]: ifwan: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks ROUTERADVERT
Jul 22 00:38:47 sirius dhcpcd[594540]: ifwan: router 45.151.241.251 requires a host route
Jul 22 01:08:37 sirius dhcpcd[594540]: ifwan: soliciting an IPv6 router
Jul 22 01:08:37 sirius dhcpcd[594540]: ifwan: sending Router Solicitation
Jul 22 01:08:37 sirius dhcpcd[594540]: ifwan: Router Advertisement from fe80::e2f6:2dff:fe92:75f3
Jul 22 01:08:37 sirius dhcpcd[594540]: ifwan: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks ROUTERADVERT
Jul 22 01:38:28 sirius dhcpcd[594540]: ifwan: soliciting an IPv6 router
Jul 22 01:38:28 sirius dhcpcd[594540]: ifwan: sending Router Solicitation
Jul 22 01:38:28 sirius dhcpcd[594540]: ifwan: Router Advertisement from fe80::e2f6:2dff:fe92:75f3
Jul 22 01:38:28 sirius dhcpcd[594540]: ifwan: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks ROUTERADVERT
Maybe it should do this a few seconds earlier. I don't know how randomization and precision of timer works.
Hi, to give some background about why this is done. Sorry for this lengthly post, it just gives you some reasoning why the things are like they are. I was in communication with my internet provider and also checked for sources in several forums and blog posts about how those ISPs for DSL or fiber are working internally.
Basically my provider says: "This is a known problem with our JuniperOS version. Juniper is informed but they have no solution for their current setup. If you want to make sure you have a stable connection, use PPPoE because the IPv6 route setup is there encapsulated into the PPPoE negotiation. DHCPv6 is there only needed for prefix delegation" (actually that was the setup I had before, in my old config the provider had only PPPoE and the setup therewas that the PPP device got a link local address via PPP proto and the router was negotiated (also as link-local address).
Actually the reason for the whole thing is a combination of the following:
I hope this explains a bit why ISPs are doing this. If you want to get more information about the L2-BSA (layer2 bitstream access). Both docs/specs are in German only, sorry!:
To conclude: IMHO the RFC4861 standard should be updated/changed to allow router solicitation to refresh, because sending broadcast/multicast is not possible everywhere! Actually this is also stated in the spec. Actually this is one of the major problems many implementations with IPv6 have: the requirement for not client initiated configuration to setup SLAAC addresses and routes.
P.S.: To the original poster @ThomasZeman: From your logs, you are/were using PPPoE. So I wonder why you wanted to get a route by using RA/DHCPv6 at all. Normally if you correctly configure the PPP connection it will handle default route for both IPv6 and IPv4. DHCPv6 is only needed for prefix delegation.
Hi Uwe, thanks for putting so much thought into this! Unfortunately, I don't remember exactly what I did and why I did it. I experimented with Ipv6 connectivity as a customer of one the larger end-consumer providers here in Australia - Vocus telecommunications ( www.vocus.com.au ) and it might well be that their setup was also experimental at the time (Meanwhile I changed to Aussie Broadband and run ipv4 only - at least at the moment) At the time it just made a lot of sense to me to send out another RS message when the current one was about to expire (standard or not) because as a user I have no influence on non-conformance of my ISP and ringing their hotline does not lead anywhere (which I actually tried). My feeling is most customers of the bigger ISPs would use the ISP supplied routers with their own custom software. That software is a blackbox to us as end customers and might perhaps indeed contain a "workaround" to get this working - we would never know. Long story short: If we can make life of a few linux users easier who have their boxes directly on the wire, let's do it because debugging this is quite time consuming and not everyone has the skills.
At the time it just made a lot of sense to me to send out another RS message when the current one was about to expire (standard or not) because as a user I have no influence on non-conformance of my ISP and ringing their hotline does not lead anywhere (which I actually tried).
If you want to know the setup for PPPoE, I can share it with you (contact me). There you have to actually disable router solicitation in dhcpcd and only enable ipv6 negotiation on the PPPoE daemon (+ a script in ipv6-up.d
to add the default route using PPP_REMOTE
, I borrowed this from openwrt). If you enable ip/router negotiation also on dhcpcd, you actually override the routing and local addresses setup done by pppd (it undoes the work by pppd). Dhcpcd should use a minimal ipv6 only setup just requesting a prefix delegation (and nameservers if you're interested) and no IP addresses or similar. It's only a few lines.
The RS problem described here is still critical for connections without PPP, so yes the patch with sending additional RS is really needed for setting up consumer routers with DSL or fiber lines.
P.S.: the reason why most people use dhcpcd instead of ISC dhcpd for DSL/fiber lines is not only it's great and simple config language, it also works on top of PPP tunnels. ISC used to require an Ethernet interface (not sure if it is still the case). The other alternative for PPPoE was wide-dhcpv6, but it is now unmaintained. So @rsmarples should really work on adding useful support for the consumer routers use case, although some variants violate specs.
So one symptom of this was dhcpcd not sending out solicitations at all when the carrier flipped for PPP interfaces or the PPP interface is created while dhcpcd is running. It never spotted the LL address completing DAD and stalled. This has now been fixed in the master branch with may address the underlying issue here.
When dhcpcd recognises my ipv6 interface is up (ppp0) it sends out a router solicitation message on the same interface. My ISP answers with a router advertisement having a lifetime of 1800s. After these 1800s (30mins), dhcpcd removes the default IPv6 route and I lose IPv6 connectivity. The following shows the dhcpcd log:
Mar 23 23:57:03 [10166]: dhcpcd-9.4.1 starting
Mar 23 23:57:03 [10166]: chrooting as dhcpcd to /var/lib/dhcpcd
Mar 23 23:57:03 [10168]: spawned manager process on PID 10168
Mar 23 23:57:03 [10166]: sandbox: seccomp
Mar 23 23:57:03 [10168]: spawned privileged proxy on PID 10169 Mar 23 23:57:03 [10168]: spawned network proxy on PID 10170 Mar 23 23:57:03 [10168]: spawned controller proxy on PID 10171 Mar 23 23:57:03 [10168]: DUID 00:01:00:01:29:91:8c:a6:dc:a6:32:d5:e8:12 Mar 23 23:57:03 [10168]: ppp0: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks PREINIT Mar 23 23:57:03 [10168]: ppp0: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks CARRIER Mar 23 23:57:03 [10168]: ppp0: IAID 00:00:00:01 Mar 23 23:57:03 [10168]: ppp0: delaying IPv6 router solicitation for 0.4 seconds Mar 23 23:57:03 [10168]: ppp0: reading lease: /var/lib/dhcpcd/ppp0.lease6 Mar 23 23:57:03 [10168]: ppp0: soliciting a DHCPv6 lease Mar 23 23:57:03 [10168]: ppp0: delaying SOLICIT6 (xid 0xaeda3f), next in 2.1 seconds Mar 23 23:57:03 [10168]: lan: activating for delegation Mar 23 23:57:03 [10168]: lan: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks PREINIT Mar 23 23:57:03 [10168]: lan: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks CARRIER Mar 23 23:57:03 [10168]: lan: IAID 32:d5:e8:12 Mar 23 23:57:03 [10168]: ppp0: soliciting an IPv6 router Mar 23 23:57:03 [10168]: ppp0: sending Router Solicitation Mar 23 23:57:03 [10168]: ppp0: Router Advertisement from fe80::6e20:56ff:fe69:7b00 Mar 23 23:57:03 [10168]: ppp0: adding address 2403:4800:2f04:5fbc:e9a2:ca94:42f1:86ff/64 Mar 23 23:57:03 [10168]: ppp0: pltime 604800 seconds, vltime 2592000 seconds Mar 23 23:57:03 [10168]: ppp0: adding route to 2403:4800:2f04:5fbc::/64 Mar 23 23:57:03 [10168]: ppp0: adding default route via fe80::6e20:56ff:fe69:7b00 Mar 23 23:57:03 [10168]: ppp0: waiting for Router Advertisement DAD to complete Mar 23 23:57:03 [10168]: ppp0: Router Advertisement DAD completed Mar 23 23:57:03 [10168]: ppp0: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks ROUTERADVERT Mar 23 23:57:03 [10168]: ppp0: sending NA for 2403:4800:2f04:5fbc:e9a2:ca94:42f1:86ff/64 Mar 23 23:57:04 [10168]: ppp0: sending NA for 2403:4800:2f04:5fbc:e9a2:ca94:42f1:86ff/64 Mar 23 23:57:05 [10168]: ppp0: broadcasting SOLICIT6 (xid 0xaeda3f), next in 1.0 seconds Mar 23 23:57:05 [10168]: ppp0: ADV 2403:4800:245f:bb00::/56 from fe80::6e20:56ff:fe69:7b00 Mar 23 23:57:05 [10168]: ppp0: broadcasting REQUEST6 (xid 0x2509bd), next in 1.0 seconds Mar 23 23:57:05 [10168]: ppp0: REPLY6 received from fe80::6e20:56ff:fe69:7b00 Mar 23 23:57:05 [10168]: ppp0: renew in 302400, rebind in 483840, expire in 4294967295 seconds Mar 23 23:57:05 [10168]: lo: adding reject route to 2403:4800:245f:bb00::/56 Mar 23 23:57:05 [10168]: ppp0: writing lease: /var/lib/dhcpcd/ppp0.lease6 Mar 23 23:57:05 [10168]: ppp0: delegated prefix 2403:4800:245f:bb00::/56 Mar 23 23:57:05 [10168]: lan: adding address 2403:4800:245f:bb02::1/64 Mar 23 23:57:05 [10168]: lan: pltime infinity, vltime infinity Mar 23 23:57:05 [10168]: lan: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks DELEGATED6 Mar 23 23:57:05 [10168]: lan: adding route to 2403:4800:245f:bb02::/64 Mar 23 23:57:05 [10168]: ppp0: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks BOUND6 Mar 23 23:57:05 [10168]: ppp0: sending NA for 2403:4800:2f04:5fbc:e9a2:ca94:42f1:86ff/64 Mar 24 00:27:03 [10168]: ppp0: fe80::6e20:56ff:fe69:7b00: router expired Mar 24 00:27:03 [10168]: ppp0: part of a Router Advertisement expired Mar 24 00:27:03 [10168]: ppp0: deleting default route via fe80::6e20:56ff:fe69:7b00 Mar 24 00:27:03 [10168]: ppp0: executing: /usr/lib/dhcpcd/dhcpcd-run-hooks ROUTERADVERT
and the according tcpdump:
23:57:03.823317 ppp0 Out IP6 fe80::ce01 > ff02::2: ICMP6, router solicitation, length 8 23:57:03.830476 ppp0 In IP6 fe80::6e20:56ff:fe69:7b00 > fe80::ce01: ICMP6, router advertisement, length 56 00:05:31.266692 lan Out IP6 fe80::dea6:32ff:fed5:e812 > ff02::1: ICMP6, router advertisement, length 88 00:13:01.620526 lan Out IP6 fe80::dea6:32ff:fed5:e812 > ff02::1: ICMP6, router advertisement, length 88 00:22:59.413426 lan Out IP6 fe80::dea6:32ff:fed5:e812 > ff02::1: ICMP6, router advertisement, length 88 00:31:27.815649 lan Out IP6 fe80::dea6:32ff:fed5:e812 > ff02::1: ICMP6, router advertisement, length 88 00:39:45.938377 lan Out IP6 fe80::dea6:32ff:fed5:e812 > ff02::1: ICMP6, router advertisement, length 88 00:49:07.482288 lan Out IP6 fe80::dea6:32ff:fed5:e812 > ff02::1: ICMP6, router advertisement, length 88 00:57:10.768177 lan Out IP6 fe80::dea6:32ff:fed5:e812 > ff02::1: ICMP6, router advertisement, length 88 01:06:52.687305 lan Out IP6 fe80::dea6:32ff:fed5:e812 > ff02::1: ICMP6, router advertisement, length 88
(where
fe80::dea6:32ff:fed5:e812
is internal LL)dhcpcd does not send out a new router solicitation message before the previous advertisement expires which would be the desired behaviour. (In my case e.g. at 90% of 1800s)
It seems the current behaviour is by design: https://github.com/NetworkConfiguration/dhcpcd/blob/b09ed786b860f8401b6063553fbc8af293a6dc52/src/ipv6nd.c#L1517 when an advertisement is received, the timer (timeouts) for sending out new solicitation messages are cleared and never "recharged". According to RFC 4861 https://datatracker.ietf.org/doc/html/rfc4861#section-4.2 it is not a "MUST" for a router to periodically send out advertisement but is okay to only react on solicitation message.
This is issue is a bit a head scratcher for me. As it stands at the moment, the behaviours of my ISP and dhcpcd are incompatible making it impossible for me to use them together. I am surprised no one else had this problem so far.
Proposal: Register a new timeout when a router advertisement is received which sends out a solicitation message when e.g. 90% of the advertisement lifetime has passed.