contiki-ng / contiki-ng

Contiki-NG: The OS for Next Generation IoT Devices
https://www.contiki-ng.org/
BSD 3-Clause "New" or "Revised" License
1.3k stars 698 forks source link

RPL DAO inconsistency handling not working for all scenarios #2385

Open arurke opened 1 year ago

arurke commented 1 year ago

Summary

When a DAO inconsistency is noticed in storing mode RPL, a No-Path DAO is sent to the transmitter in order remove the erroneous route. This behavior seems to be Contiki-NG-specific as RFC6550 11.2.2.3 states the packet itself should be returned with the Forwarding-Error bit set. The No-Path DAO approach does not seem watertight as a receiver will not process a DAO if the transmitter has a lower rank. Thus the faulty route remains in such situations, with a new No-Path DAOs are generated for every forwarded packet, until the route times out.

I added a workaround at rpl-icmp6.c:L719 such that No-Path information is processed regardless of rank, and it seemed to resolve the situation in my simulations. However, am not an RPL expert so I would love if someone would chime in.

Reproducing

Assume a simple network with RPL classic storing mode, and A being the root in the following RPL topology:

A ---- B ---- C ---- D

Next, the RPL topology is changed such that C is a child to A directly (but B can still reach C on the link-layer):

 + ---- C ---- D
 |
 |
 A ---- B

When B wants to send to D it still has the old route entry, so it transmits it to C. This is accepted by RPL as long as the packet is going downwards in the tree. We now have traffic flowing B -> C -> D.

Next, assume C loses the route to D (for any reason). When C receives another packet from B towards D, it has to route it upwards to its preferred parent - which RPL will not allow as the packet would then be going upwards. This is picked up by RPL in rpl-ext-header.c:L480, which spurs a No-Path DAO from C to B.

However, if C happens to have a lower rank than B, the No-path DAO will be considered as a signal of a loop by B. And critically, the DAO and its no-path information will not be treated any further. As a response to the loop, B sets the rank of C to INFINITE, but that does not influence the routes with C as next hop. As such, traffic from B to D is still sent to C - repeating the process above - until the routes time out.

thvdveld commented 1 year ago

Next, the RPL topology is changed such that C is a child to A directly (but B can still reach > C on the link-layer):

 + ---- C ---- D
 |
 |
 A ---- B

When B wants to send to D it still has the old route entry, so it transmits it to C. This is > accepted by RPL as long as the packet is going downwards in the tree. We now have traffic flowing B -> C -> D.

When C selects a new parent, it should first send a no-path DAO to its old parent, which I don't think you assume here, right? This is described in Section 9.8, the 4th item in the list. If C first sends a no-path DAO to B, then I don't think there is an issue. I'm not sure if Contiki does this already.

arurke commented 1 year ago

Thanks for you thoughts @thvdveld. I played a bit in Cooja, this is what I found: You are correct that C will send a No-Path DAO for its own prefix. Consequently, B will remove the route to C from its routing table. However, B still has a route to D, with C as the next-hop. When B wants to send a new packet to D, it will lookup this route and with C in the neighbor-table the packet will still go B -> C -> D.

Unfortunately I could not reproduce the full issue in Cooja without more efforts. I was unable to reliably make C have a higher rank than B such that the DAO would be rejected.

thvdveld commented 1 year ago

I don't think that node B should use the neighbor table for routing the packet to C. With the no-path DAO, any route that needs to go via C should be invalidated, unless it is a one-hop route learned from a multicast DAO. A packet should only be routed Up or Down. So the packet really should go to the preferred parent of B first, which is A and A should know the next-hop for D, which is C. So the route should then be B -> A -> C -> D. The no-path DAO makes sure that B will never send to C. That's also why I think DAOs should always be acknowledged, they are so crucial.

I based my thoughts on Section 11.1.

arurke commented 1 year ago

I see two ways your scenario could be fulfilled, either

  1. Node C includes both the prefix for himself and D in the No-Path DAO, or
  2. When B receives the No-Path for prefix C, it should remove the route to C, AND remove any routes that have C as a next-hop

From your ref. Section 9.8, it does not seem like C should send the D prefix in No-Path since it has not lost D from its parent-set. I was unable to find anything in the RFC pointing in the direction of the second item, but my expertise is not in RPL. However note that RPL does tolerate situations like this where the traffic does not strictly follow the DODAG, see Section 11.2.2.2. Also note that the situation will be rectified when the DAO/route for D expires on node B.

thvdveld commented 1 year ago

I don't think all routes should be removed. Only the route to B should be removed.

I think the problem is that the neighbor table is used to route a packet. When a routing protocol is used, only the routing table information should be used for routing a packet.

A recap: C should send a No-Path DAO to B with its prefix. Node B removes the route to C in the routing table. C is still present in the neighbor table, however, this table should not be used for routing when a routing protocol is used. When B wants to send to D, it finds that the next-hop for D is C. So it looks for a route to C in its routing table, which it does not find. Thus, the default route is used, which is its parent. So then the packet is routed to A. A will normally have the correct route, since C should have sent a DAO to A.

If C still wants to receive packets from B, without the packet going to A first, it needs to send a multicast DOA message, which is used for one-hop packets. But other than that, the neighbor table should not be used for routing information.

arurke commented 1 year ago

(...) C is still present in the neighbor table, however, this table should not be used for routing when a routing protocol is used. When B wants to send to D, it finds that the next-hop for D is C. So it looks for a route to C in its routing table, which it does not find. Thus, the default route is used, which is its parent.

I do not believe the neighbor table is used for routing in any unusual sense. And that your step "it finds that the next-hop for D is C" is actually the routing lookup. I have not verified the following in the simulator, but my understanding:

After Node B receives the No-Path DAO from C with C-prefix, the routing-and neighbor-table of Node B looks like:

Destination Next hop
Default route Node A
Node D Node C
Neighbor Address
Node A MAC addr. A
Node C MAC addr. C

When Node B wants to send a packet to Node D, it follows the regular process to route and forward a packet: 1) Looks in the routing table for destination D, and finds the route with next hop C, 2) Looks in neighbor table for node C, finds the MAC address and forwards packet.

EDIT: Updated link to neighbor table lookup

thvdveld commented 1 year ago

Indeed, the next-hop for D in the table is C, however, then it should look for a next-hop to C, which is not in the routing table. So it should send it to the default route, and not look up the mac address of C.

thvdveld commented 1 year ago

So I think the implementation for finding a route in Contiki should be modified. Once if finds the next-hop for some destination, it should also check if the next-hop has a route entry. In case of RPL, this route entry is not there any more, because of the no-path. This means that uip_ds6_route_lookup returns the default route instead of C.

arurke commented 11 months ago

Sorry for the delay @thvdveld, life got in the way. Unfortunately it will keep getting in the way for some time. But I will just briefly add that I am not yet convinced about your proposed modification. Especially Once if finds the next-hop for some destination, it should also check if the next-hop has a route entry. seem to me to be highly unorthodox (just thinking about my earlier experience with how e.g. Cisco does routing- and forwarding-lookup). It would also radically change a core Contiki-NG/Contiki behavior that I assume has been like this for a long time - it seems such a fundamental "flaw" would have had ramifications far beyond the niche issue we are discussing here and should thus have been noticed earlier if it indeed was a problem.

Apologies for not being more useful in my arguments right now, hopefully I can return to this later.

thvdveld commented 11 months ago

I thought about this since the last time I wrote, and I think I am wrong. I'll think about it a bit more later this week.