RIOT-OS / RIOT

RIOT - The friendly OS for IoT
https://riot-os.org
GNU Lesser General Public License v2.1
4.91k stars 1.98k forks source link

gnrc_rpl: Rejoining RPL instance as root after reboot messes up routing #5016

Closed jnohlgard closed 5 years ago

jnohlgard commented 8 years ago

Scenario: RPL network with two nodes, one root with the border router example with ethos and another as a normal single interface RPL router (gnrc_networking or microcoap_server). Border router reboots while other RPL routers are still alive -> BR routing messed up

PC address: 2001:16d8:ff00:645::2 BR address (wireless): 2001:16d8:ff00:8645:1016:4e54:8bab:4012 other node address: 2001:16d8:ff00:8645:1016:4e54:8af1:4012

The border router was started first, configured for ethos, then the following commands to set up radio and RPL on the BR:

ifconfig 6 set chan 0
ifconfig 6 set pan 1911
ifconfig 6 set page 2
rpl init 6
rpl root 0 2001:16d8:ff00:8645:1016:4e54:8bab:4012

Then I booted the wireless node and configured it similarly:

ifconfig 7 set chan 0
ifconfig 7 set pan 1911
ifconfig 7 set page 2
rpl init 7

After this, everything works, even IPv6 internet connectivity. I ran ping6 2600:: on the wireless node and it receives the replies correctly.

I then rebooted the BR node and did the same initialization commands above, however, I noticed that it joined the DODAG right after rpl init before the rpl root command. Now it is no longer working and the BR is trying to route packets via itself over the air, so ping6 2600:: from the BR results in the below packet:

0000   61 dc 0b 77 07 12 40 ab 8b 54 4e 16 12 12 40 ab  a..w..@..TN...@.
0010   8b 54 4e 16 12 7a 00 3a 20 01 16 d8 ff 00 86 45  .TN..z.: ......E
0020   10 16 4e 54 8b ab 40 12 26 00 00 00 00 00 00 00  ..NT..@.&.......
0030   00 00 00 00 00 00 00 00 80 00 c8 70 00 55 00 01  ...........p.U..
0040   55 55 55 55                                      UUUU

Frame 887: 68 bytes on wire (544 bits), 68 bytes captured (544 bits) on interface 0
IEEE 802.15.4 Data, Dst: 12:16:4e:54:8b:ab:40:12, Src: 12:16:4e:54:8b:ab:40:12
    Frame Control Field: 0xdc61, Frame Type: Data, Acknowledge Request, Intra-PAN, Destination Addressing Mode: Long/64-bit, Source Addressing Mode: Long/64-bit
        .... .... .... .001 = Frame Type: Data (0x0001)
        .... .... .... 0... = Security Enabled: False
        .... .... ...0 .... = Frame Pending: False
        .... .... ..1. .... = Acknowledge Request: True
        .... .... .1.. .... = Intra-PAN: True
        .... 11.. .... .... = Destination Addressing Mode: Long/64-bit (0x0003)
        ..01 .... .... .... = Frame Version: 1
        11.. .... .... .... = Source Addressing Mode: Long/64-bit (0x0003)
    Sequence Number: 11
    Destination PAN: 0x0777
    Destination:     12:16:4e:54:8b:ab:40:12 (12:16:4e:54:8b:ab:40:12) <<<<<<<========= SIC! 
    Extended Source: 12:16:4e:54:8b:ab:40:12 (12:16:4e:54:8b:ab:40:12) <<<<<<<========= SIC! 
6LoWPAN
    IPHC Header
        011. .... = Pattern: IP header compression (0x03)
        ...1 1... .... .... = Traffic class and flow label: Version, traffic class, and flow label compressed (0x0003)
        .... .0.. .... .... = Next header: Inline
        .... ..10 .... .... = Hop limit: 64 (0x0002)
        .... .... 0... .... = Context identifier extension: False
        .... .... .0.. .... = Source address compression: Stateless
        .... .... ..00 .... = Source address mode: Inline (0x0000)
        .... .... .... 0... = Multicast address compression: False
        .... .... .... .0.. = Destination address compression: Stateless
        .... .... .... ..00 = Destination address mode: Inline (0x0000)
    Next header: ICMPv6 (0x3a)
    Source: 2001:16d8:ff00:8645:1016:4e54:8bab:4012
    Destination: 2600::
Internet Protocol Version 6, Src: 2001:16d8:ff00:8645:1016:4e54:8bab:4012, Dst: 2600::
    0110 .... = Version: 6
    .... 0000 0000 .... .... .... .... .... = Traffic class: 0x00 (DSCP: CS0, ECN: Not-ECT)
    .... .... .... 0000 0000 0000 0000 0000 = Flowlabel: 0x00000000
    Payload length: 12
    Next header: ICMPv6 (58)
    Hop limit: 64
    Source: 2001:16d8:ff00:8645:1016:4e54:8bab:4012
    Destination: 2600::
Internet Control Message Protocol v6
    Type: Echo (ping) request (128)
    Code: 0
    Checksum: 0xc870 [correct]
    Identifier: 0x0055
    Sequence: 1
    [No response seen]
        [Expert Info (Warn/Sequence): No response seen to ICMPv6 request in frame 887]
            [No response seen to ICMPv6 request in frame 887]
            [Severity level: Warn]
            [Group: Sequence]
    Data (4 bytes)
        Data: 55555555
        [Length: 4]

After the issue appears the fibroute command shows that the default route via the ethos interface was deleted (by joining the RPL DAG as a router?)

running fibroute add :: via fe80::1 dev 7 makes routing work on the BR again and I can yet again communicate between the PC and the 6lowpan network.

miri64 commented 8 years ago

RPL is not really my forte :-)

cgundogan commented 8 years ago

Once rpl is initialized it sends out a DIS to probe for a DAG in its neighborhood. The DAG of the other node is still alive and emits DIOs.

I could propose two solutions for that 1) parameterize rpl_init, so that you can specify if a probing (DIS) should be done or not. You still have a decreased chance that you join a DAG if a DIO is emitted after you initialize and before you create the root, however. 2) also initialize rpl (if not done before) when calling rpl_root, this way the root node must not call rpl_init before rpl_root and hence there is no time in between to join the defunct DAG.

edited

jnohlgard commented 8 years ago

@cgundogan What is the proper procedure for recovering from the root node of an RPL DODAG losing its state? Increment DODAG version and let all nodes re-join the tree?

cgundogan commented 8 years ago

There is no proper procedure that I know of. If the root node loses its state, then how should it know from which version it should start? If the root node just creates a new DAG with the same instance id and dodag id, but a lower version number, then I have no idea how a node that is in a DAG with the same instance id and DODAG id but a higher version should react. Probably will they ignore the new DODAG until the DAG fades away due to timeouts. I will have to look into the RFC, it's an interesting use case.

cgundogan commented 8 years ago

@BytesGalore do you have any clues?

BytesGalore commented 8 years ago

Basically the situation described is "just" a local inconsistency, independent if its the root-node or any other node. Just a guess, I think the node should reply a unicast DIO with the current DODAG version when receiving a outdated version from the root node. In turn the root can adjust its version. (BTW, you can test this situation with the attacker #4831 just rising the DODAG version of a root child-node)

cgundogan commented 8 years ago

(BTW, you can test this situation with the attacker #4831 just rising the DODAG version of a root child-node)

what's a root child-node? (:

BytesGalore commented 8 years ago

I mean a node just below the root, using the root node as parent.

OlegHahm commented 8 years ago

Maybe it makes sense to discuss this on the mailing list to allow a broader audience to participate?

BytesGalore commented 8 years ago

I did not said it explicitly before so: The root-node do a global repair when receiving a DIO with a higher DODAG version then its own. As result, the root-node will not join the outdated advertized DODAG as router and not add/overwrite its default-route entry with an entry to the advertized DODAG-root (itself).

jnohlgard commented 8 years ago

@BytesGalore That's good to hear. As a workaround, I have modified my border router application to do rpl init immediately followed by rpl root 0 (myip) and it seems to work across reboots.

miri64 commented 8 years ago

Can someone dump this result to the mailing list too?

miri64 commented 6 years ago

Sorry 'bout the project back-and-forth.... Thought I accidentally moved it to "In Progress" there in connection to #7925.

cgundogan commented 6 years ago

@gebart is #8173 a remedy for this issue? Can it be closed?

jnohlgard commented 6 years ago

@cgundogan it looks like #8173 could be a fix for this, I have not had time to test.

cgundogan commented 6 years ago

@gebart do you think this issue can be closed?

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

miri64 commented 5 years ago

@gebart do you think this issue can be closed?

I take this as a yes... Reopen if you disagree