dtaht commented 7 years ago

I have been poking all over the stack on a lede build from thursday, and I think many of the issues we've been having with ipv6 for a year now have involved not setting various timers correctly in the kernel, from various places they needed to be set by.

Also the linux kernel added the new notion of mgmtempaddrs and noprefixroute over the past 2 years or so. lede seems to be defaulting to noprefixroute (which seems to be the right idea) but that is not the default in things like my ubuntu system, and this mild change (where adding an address/64 used to add a route/64), may also be causing certain things to fail.

In surveying the carnage:

iproute2:

iproute2 added the ability to set "ip route ... expires X" in 4.5, but it was not in 4.4, and not done correctly until 4.7. Not sure if it was busted in the kernel also...

there are also weird dependencies in ip -6 addr if you try to set preferred_lft independently of valid_lft. these always should be modified together for sane results.

netifd

For ula globals it has never set a leasetime or route expires in the kernel. It probably should (and default to something like 20 minutes otherwise), and periodically refresh the associated addresses and distributed routes for that. (or punt to something else to do it)

netifd also has a a notion of "wallclock" time, where I'm not sure what is supposed to happen if stuff starts before time is slewed forward for the first time by ntp, or as time is being slewed.

netifd does appear to be trying to set valid_lft, preferred_lft, but seems to be failing in many cases.

Most of the /lib/netifd/protos do not bother trying to set a timer,

when they should

/lib/netifd/dhcpv6.script (called by odhcp6c)

I successfully modified this to actually pass valid_lft and preferred_lft to the kernel by using the full syntax for proto_add_ipv6_address.

(this was my first clue that netifd was supposed to actually be writing timer related info to the kernel. It does keep preferred and valid correctly in ifstatus whatever.)

relevant bit was:

    for entry in $ADDRESSES; do
            local addr="${entry%%/*}"
            entry="${entry#*/}"
            local mask="${entry%%,*}"
            entry="${entry#*,}"
            local preferred="${entry%%,*}"
            entry="${entry#*,}"
            local valid="${entry%%,*}"

Override these so we can watch what happens between refreshes

            [ "$preferred" -gt "1600" ] && preferred=1600
            [ "$valid" -gt "1800" ] && valid=1800

            proto_add_ipv6_address "$addr" "$mask" "$preferred" "$valid" 1

6rd.sh

despite mucho hacking the exact same syntax as above for proto_add_ipv6_address fails.

Also it is not being triggered at all when the underlying lease on the interface is renewed by udhcp. The underlying addresses do change... my udhcp lease is 600 seconds, and advertising an infinite lifetime to other hosts is kind of bad, 'cause it's changed 3 times in the last day alone.... There's 3 kinds of cruft in the code going back years...

The various other things for route timers syntactically

(proto_add_ipv6_route proto_add_ipv6_prefix) appear correct, but: generally do not work. (and are not being used) Also the separator syntax (being of slashes) is not the best, commas would be saner, I think, to parse. There's at least one error in the 6rd code where we end up with an extra /60/60 that is mere matter of luck that parses elsewhere.

odhcpd

So far as I understand odhcpd gets stuff via ubus and punts some of it back that way. It seems to be working for dhcpv6 (but does supplies infinite leases until you convince odhcp6c to get addresses defined with shorter leases)

and netifd corresponds over ubus...

I've noted elsewhere odhcpd (bug 388) gets completely wedged if you hit it with a couple odhcp-pd requests and starts only periodically offering ras. It also loses track of its internal lease table....

dnsmasq dhcpv6

kind of depends on there being timers for the addresses on the link that it is offering addresses for to route, and I don't know if it thinks about the noprefixroute concept and routes/addresses separately.

I haven't even tried to figure out if slaac is working right.

some further ra issues

odhcp6c

Also installs a ton of dhcpv6 static routes that make pretty good sense for a default gw to an ISP, but not for an interior one. It won't announce a ra by itself... relying on another daemon to decide to do so.

It does not seem to listen to further ra's on the link it is listening on (for more specific routes on it's gateway). You unfortunately cannot rely just on dhcpv6 for routes in ipv6.

we use source specific routing to only allow for it's assigned prefixes to exit (good thing, no bcp38 needed), which means we do not install a default ra to the universe because ra's don't do source specific announcements (Although there is a proposal slowly drifting through the ietf on the matter).

the static route problem also gets in the way when you have two dhcp-pd clients - they end up essentially in ap isolation mode when no default route is also present. The "righter" answer is to be announcing your interior router's subnet offlink via, but not a default gw.
accept_ra 2 is no longer accepted syntax in uci. Using this (carefully) covered a multitude of sins in the good ole days (2013). It can still be set via sysctl, and helps (For now) in a couple scenarios.

Next steps for all concerned for whoever wants to chip in.

backport the relevant expires fixes for iproute2 (or just upgrade it to head) look over the in-kernel interfaces for ipv6 timer and for any route related stuff also review noprefixroute stuff make netifd do saner things with timers and ulas review the netlink apis and usages in the various odh utils *review the protos and make sure they are passing the right things and are getting the right events and hooked to the right stuff, getting correctly into ifstatus and elsewhere. (I just did 6rd and dhcpv6 - anyone for he? pppoe? 6to4? gre? homenet?)

*add some error checking and optional logging to anything writing or reading netlink. Everywhere.

write some documentation on how netifd/ubus/odh*/uci is supposed to interact in the first place. With pictures. I have 30+ pages of notes at this point, and dozens of packet captures, screen shots, and route dumps.....

I'd love a "simple ubus listen daemon example - one that listens merely for events on a set of interfaces. or routes or on a proto".

For all I know there's better tools for looking at ubus but my head spins from installing my post-eyeball json checker. I'd like to for example be checking that everything is always correctly formatted json passing through... is there a way to do that? While, like, um, hammering ubus?

and then there's this long list of needed features for odchpd, bringing dnsmasq more up to snuff, and...

the original bug 388 I started with, which I kind of hope will yield to fixing several of the above....

But I'd settle for first getting a solid notion of timers throughout the stack, actually working, and setting them to really low values to see them actually do their dances.

dtaht commented 7 years ago

It appears it gets it from the assigned leasetime

EricLuehrsen commented 7 years ago

@dtaht - these timer settings seem a core functional error in LEDE. Routing related tools (like odhcpd) get confused. Should the 388 bug report be duplicated but rephrased with this perspective? I think LEDE (if it were a commercial product) should be held up in RC's for something like this.

dtaht commented 7 years ago

I have gone back to fiddling with netifd's notions of time and routes. With a simple patch to track things and keep things refreshed, some things got better (with odhcpd in the loop), some things got worse...

http://www.taht.net/~d/001-expires.patch

I see netifd trying to flush routes that it probably shouldn't.

Mon Feb 6 21:34:32 2017 daemon.warn netifd: 0 route expiring... Mon Feb 6 21:34:32 2017 daemon.warn netifd: 0 route expiring... Mon Feb 6 21:34:32 2017 daemon.warn netifd: 0 route expiring... Mon Feb 6 21:35:04 2017 daemon.warn netifd: 0 route expiring...

At this point I think I need a vm to debug on. And more time than I have available. And to upgrade to a later ip route that shows expires info.

As for break/nobreak decisions on lede's release... not my call. I think they need to do a release, then think deeply upon these issues and plan on a major roll up bug fix release 3-6 months later. Which probably needs the 4.9 kernel upgrade to fix some issues - for example the above patch is applying the RTA_EXPIRES thing on top of what the kernel thinks is the PAD. Possibly. Several problems seem to have been kicked off by the NOPREFIXROUTE stuff.

dtaht commented 7 years ago

I've also been poking into all the possibly untrapped error conditions from nl_error. https://plus.google.com/u/0/107942175615993706558/posts/A5wcLysZLRk

dtaht commented 7 years ago

There are a lot of bugs in the above. #10 answers one of those questions.

dtaht commented 7 years ago

For the record, how dnsmasq gets it (maybe BSD only) is via:

          if (fd != -1 && ioctl(fd, SIOCGIFALIFETIME_IN6, &ifr6) != -1)
            {
              valid = ifr6.ifr_ifru.ifru_lifetime.ia6t_vltime;
              preferred = ifr6.ifr_ifru.ifru_lifetime.ia6t_pltime;
            }

dtaht / dnsmasq-lede

where does dnsmasq-dhcpv6 get it's timer info from. Routes? Addresses? Can it be overridden? #7

Most of the /lib/netifd/protos do not bother trying to set a timer,

Override these so we can watch what happens between refreshes

The various other things for route timers syntactically

some further ra issues

Next steps for all concerned for whoever wants to chip in.