contiki-ng / contiki-ng

Contiki-NG: The OS for Next Generation IoT Devices
https://www.contiki-ng.org/
BSD 3-Clause "New" or "Revised" License
1.29k stars 697 forks source link

RPL-lite / CSMA / Cooja motes simulation issue #312

Closed zsrkmyn closed 6 years ago

zsrkmyn commented 6 years ago

We use examples/rpl-udp/udp-{server,cilent}.c to test RPL on cooja mote.

In simulation, we add 100 clients and 1 server. The csc file can be found at https://pastebin.com/93WcJT2u

We use rpl-lite as the routing protocol, and set RPL_NS_CONF_LINK_NUM to 120 in project-conf.h.

When setting RPL_CONF_PROBING_SEND_FUNC to rpl_icmp6_dio_output, after 1 hours in simulation time, only several nodes around the root can join the RPL instance. Setting RPL_CONF_PROBING_SEND_FUNC to rpl_icmp6_dis_output makes it little better.

However, when we run examples/simple-udp-rpl in the contiki 3.x with the same distribution of these motes, it takes only 4 minutes in simulation time for the root to recieve the first message from the farest nodes.

We also dig out the log of RPL modules, but there are so many unreasonable packet transmissions and parent switchs that we are hard to describle them here. And the log file is larger than 100MiB, so we'll upload it if you need.

Feel free to ask us for more details! XD

simonduq commented 6 years ago

Thanks for reporting. This is a challenging simulation environment, large and very sparse (just 4 nbr max) yet with significant interference range (about 20 nbr).

The problem seems to be with CSMA in Cooja Motes. In Cooja motes, transmission timing is coarse-grained, with only 1ms precision. The current CSMA timings result in re-transmissions with very low backoff exponent. Combine the two above and you get cascaded collisions of nodes colliding and re-trying every 1ms.

With some work on CSMA I got your simulation file work reliably. What I did:

BTW, note that RPL_NS_CONF_LINK_NUM is now obsolete (will update the doc), please use NETSTACK_MAX_ROUTE_ENTRIES instead (but the default on Cooja motes is 300, so this was not the issue).

simonduq commented 6 years ago

New defaults proposed in https://github.com/contiki-ng/contiki-ng/pull/315

zsrkmyn commented 6 years ago

The PR improves the performance, but the network is still not as good as the 3.x version.

We set LOG_CONF_WITH_ANNOTATE to 1 in project-conf.h, and change the uip-ds6-route.c as follow and enable Mote relations in cooja to show default routes of nodes in cooja.

diff --git a/os/net/ipv6/uip-ds6-route.c b/os/net/ipv6/uip-ds6-route.c
index 9321b632c..953b677cd 100644
--- a/os/net/ipv6/uip-ds6-route.c
+++ b/os/net/ipv6/uip-ds6-route.c
@@ -637,7 +637,7 @@ uip_ds6_defrt_add(uip_ipaddr_t *ipaddr, unsigned long interval)
     d->isinfinite = 1;
   }

-  LOG_ANNOTATE("#L %u 1\n", ipaddr->u8[sizeof(uip_ipaddr_t) - 1]);
+  LOG_ANNOTATE("#L %u 1;red\n", ipaddr->u8[sizeof(uip_ipaddr_t) - 1]);

 #if UIP_DS6_NOTIFICATIONS
   call_route_callback(UIP_DS6_NOTIFICATION_DEFRT_ADD, ipaddr, ipaddr);

We test the rpl-lite on both cooja and sky motes (tesing on sky mote can be very slow :-( ) , it seems the default routes change frequently and wrongly.

In contiki 3.x, we set RPL_CONF_MOP to RPL_MOP_NON_STORING to enable non-storing mode of RPL, and the default routes are more stale and reasonable.

I'd like to record some videos to elaborate the issue if I am free this evening. XD

cc @dongdongbh

simonduq commented 6 years ago

Ok it's most likely that the defaults in RPL lite don't do great in your network. First thing that would come to my mind is try disabling the ETX squaring (link-stats module). It helps build reliable routes but in also makes the DAG more jittery. Might be sub-optimal in certain scenarios.

BTW have you tried rpl-classic (in non-storing mode), for the sake of finding out if the issue came with rpl-lite or contiki-ng?

dongdongbh commented 6 years ago

I have tested three cases on my network distribution as following:

  1. contiki 3.x with non storing mode
  2. contiki-ng with rpl-classic and non storing mode
  3. contiki-n with 'rpl-lite', and the CSMA setting following your suggestion.

The performance is 1>2>3, In first condition, the network set up is fast and also more reliable than the other two setting. All nodes in the network can reach the root node. In second condition, only part of nodes can reach the root, and the routing decision also not good for communication. In the last condition, it works worse, almost nodes can not reach the root node.

According my test, it seems that it is contiki-ng's issue, also rpl-lite work worse than rpl-classic in our network.

Here are the screenshot of three condition results.

rpl-compare

cc @zsrkmyn

simonduq commented 6 years ago

You must be doing something wrong. On a clean repo I get a fully connected network at after 2m30s. There are very few if any parent switch after that. I see no traffic loss at all: all application traffic is received successfully, up, and then back down (I get one hundred "Received response 1", one hundred "Received response 2" etc.).

I thought you were reporting on sub-optimal performance, not something as broken as in the screenshots above...

I used the branch with the CSMA fix with added annotations but no other modif. Latest Cooja and MSPSim from the Contiki-NG repo. All running in Docker.

image

image

zsrkmyn commented 6 years ago

Oh, I'm sorry. We first worked on commit 9777ac4 but found the network was not as stable as an older commit, so we checked out to an older commit and continued working on it, and we all forgot that our working directory was not updated with upstream.

After swiching to the latest commit and applying the csma-defaults patch, all problems disappear.

I really appreciate your help! :-)

simonduq commented 6 years ago

Ah that is very good to hear :) Thanks again for reporting, got the CSMA issue noticed!