ARMmbed / mbed-os-example-client

DEPRECATED: This is the mbed Client example application for mbed OS.
https://cloud.mbed.com/docs/current/welcome/index.html
Other
20 stars 58 forks source link

Network problems with 7 or more clients connected to the mbed Device Connector #266

Closed rspelta closed 7 years ago

rspelta commented 7 years ago

Description

With seven or more clients connected to the mbed Device Connector some of them are not listed temporally or disappear until I reset the client.

We are using mbed-os-example-client for 10 boards, 3 of them are NUCLEO_F429ZI+X-NUCLEO-IDS01A4 boards and the others are sensor-node boards (it's a new kind of board, we are developing on it, @MarceloSalazar knows it). The border router uses the nanostack-border-route repository and it is a NUCLEO_F429ZI+X-NUCLEO-IDS01A4, it uses a ethernet cable connected to the internet router (with enabled 6to4 tunneling). The mesh is via Spirit, we have worked with Wolfgang (@betzw) in order to have the spirit driver stable.

The NUCLEO_F429ZI boards use the lastest commit of the repositories, the sensor-node board come from the versions:

mbed-os-sensor-node (5f27acc88541) |- easy-connect (6fb5842becae) | |- atmel-rf-driver (57f22763f4d3) | |- esp8266-driver (4ed87bf7fe37) | | - ESP8266\ATParser (269f14532b98) | |- mcr20a-rf-driver (d8810e105d7d) |- stm-spirit1-rf-driver (ac7a4f477222) |- mbed-client (f8f0fc8b9321) | |- mbed-client-c (c739b8cbcc57) | |- mbed-client-classic (f673b8b60779) | - mbed-client-mbed-tls (7e1b6d815038) |- mbed-os (ed4febefdede) - pal (4e46c0ea8706)

Every client has its security.h file and a different MAC address. We have done this test:

  1. turn on the border router and wait until it is ready
  2. turn on one client
  3. wait to see it is connected to the mbed connector website and try to request information via API console
  4. go to point 2

here you can read the trace of the border router: https://pastebin.com/ti8U3HgS

From the sniffer seems that the lost clients (I mean who disappear from the mbed connector list) continue to communicate with the border router. We don't have information about how the nanostack works so we have problem to get a better idea what is going on.

To try to understand what happen we have:

Our goal is to have more than 10 boards connected to the mbed Device Connector. What can we do?

ciarmcom commented 7 years ago

ARM Internal Ref: IOTCLT-1854

betzw commented 7 years ago

Cc @SeppoTakalo, @JanneKiiskila, @screamerbg

MarceloSalazar commented 7 years ago

@rspelta are you using the same mbed Device Connector (mDC) account for all devices? If so, would you please to generate the credentials for multiple accounts? (lets say 2 or 3). I'd like to understand whether the problem is caused by a limitation on mDC or the Border Router not properly forwarding packets. It would be useful to have a capture on the packets (tcpdump) sent between the BR and the backhaul network (tunnel).

kjbracey commented 7 years ago

I'm seeing enough "MAC TX fail" and "Source route error" messages in that log to make me think you have some sort of radio driver or hardware problem. Communication between the nodes does not seem to be as reliable as I'd expect.

"MAC TX fail" means "I tried to transmit to a neighbour, but didn't get an Ack after multiple attempts".

"Source route error A->B" means "LoWPAN router A is telling me (the border router) that they had a MAC TX error (see above) when forwarding to B".

I'd run a simple ping test from outside the mesh to the nodes to see what sort of average packet loss you're getting - both with minimum size packets and larger ones (eg 500 or 1000 bytes). The aim should be to have only a couple of percent loss at minimum size, and try to get under 10% at 1000 bytes.

kjbracey commented 7 years ago

I've just spent a little while reviewing the Spirit driver code, just to see if I can see any obvious flaws. (Hard to say much without knowing the hardware, but I can look for general issues.)

I'm a bit wary about the software ack handling - can be tricky.

There is one specific problem that could be affecting performance now - it seems to me the acks are sent with a common send() routine that enables hardware CSMA-CD. An ack should be being sent 192us after transmission completion, without CSMA. Backing off the ack will greatly reduce the chance of packets being successfully acknowledged.

Other notes on ack reception - you're calling TX_DONE whenever tx_sequence == seq_number, whether you were expecting an ack or not. This could cause stack confusion in various ways (eg if you were backing off while someone else used the same sequence number). You should only process an ACK when you actually expect one (TX completed, and AR bit was set in it)

Also, while expecting an ack, it can be beneficial to report TX_FAIL and stop expecting when you receive anything other then an ack with the expected sequence number. The stack will eventually time out if it doesn't get TX_DONE, but receipt of anything else (including a wrongly-numbered ack) indicates a lack of acknowledgment, which can be reported immediately.

betzw commented 7 years ago

Thanks a lot @kjbracey-arm!

I have immediately tried to implement all your indications. Before pushing them, could you pls. be so kind and review these. Above all, pls. let me know if I got your points correctly, am missing something, and if you think the patch fixes the issues you have listed.

rspelta commented 7 years ago

Hi, I used the last patch from @betzw. I done the tests with 9 sensor-node boards. One mbed Device Connector (mDC) account for all devices. Now it seems more stable, all the 9 clients are listed on the website and I didn't see any client disappear from the list.

However I still read these messages in the log: [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233

You can see the log of the border router (trace setted on "info level"): https://pastebin.com/d01djmcZ

And the log of one client (trace setted on "info level"): https://pastebin.com/z3yfHAgn

@MarceloSalazar next test will be with more clients connected so I will try your advices in order to have more info. Thanks you all!

JanneKiiskila commented 7 years ago

Hi all,

I would conclude that this issue is NOT an mbed-client issue, it's just the fact it can't tolerate (especially the TLS handshakes are a very, very touch area) flaky networks. @kjbracey-arm knows the gory details and somehow I feel there would be some options for making it more fault-tolerant (issue towards mbedtls perhaps - https://github.com/ARMmbed/mbedtls) to make the situation better, client can't do much about that TLS handshake thing - we just try to do it and if it fails -> we can't connect to the server.

kjbracey commented 7 years ago

You will always see some "MCPS Data fail" messages - quite a few if the network is busy.

Once a 6LoWPAN network accessing the mbed server has stabilised, I wouldn't expect to see many fails, as it shouldn't be that busy. Looking at the logs, my feeling is that performance is still a bit below par, but I'd need to see some proper stats (eg ping tests, as suggested above).

Looking at those logs, one thing occurred to me, so I just checked the driver. It doesn't actually report "CCA_FAIL", so we can't tell the difference between "channel busy" (225) and "no ack" (233). The driver should be enhanced to distinguish the cases - handling IRQ_MAX_BO_CCA_REACH. Will help diagnostics, and may help performance (Nanostack will respond differently).

I also wonder if your NumBackoffs parameter is set too low. The default for IEEE 802.15.4 is 4, and we tend to use 5. You've got it set to 3. Might be worth checking the other numbers.

The driver should tell Nanostack the number of CSMA retries it did. Maybe you can't get that number out of the driver on success, so saying 0 would be okay, but you should say "4" on a CCA_FAIL if that's your NumBackoffs setting. (ie 4 retries, 5 attempts total).

betzw commented 7 years ago

The number of backoffs can easily be changed here.

betzw commented 7 years ago

I have reflected a bit about CCA & backoff and would like to ask @kjbracey-arm how the Nanostack interfaces to persistent CCA (i.e how to implement integration of a RF supporting persistent CCA into the Nanostack)?

kjbracey commented 7 years ago

Persistent CCA is not a thing I've ever come across before reading that Spirit1 datasheet. We assume backoff as per the 802.15.4 spec - a limited number of short-duration CCA attempts at random intervals. If I'm understanding correctly, that Spirit1 mode continuously monitors and transmits as soon as it's quiet? If so seems like it would just tend to cause collisions to me, as multiple nodes could tend to go simultaneously at the end of a transmission.

Nanostack supports both drivers that don't have their own automatic backoff and those that do - if you report "CCA_FAIL" with 0 retries, then Nanostack will backoff and retry itself, counting its attempts. If you report "CCA_FAIL" with sufficient retries, then Nanostack will not retry.

(This mechanism is a bit wonky - eg Nanostack can be configured for 4 retries, but there's no way of it telling the driver how many it wants. If you did 3, Nanostack would decide that's not enough, so it will call you again, so presumably you'd do another 3 for a total of 6. Also the combination of Nanostack's random backoff plus whatever random backoff you have probably isn't ideal).

kjbracey commented 7 years ago

I guess if you did want to try that persistent CCA, you'd just enable it, and then you would presumably only report CCA_FAIL (with an artificially high "retries" to stop Nanostack trying again) if you decided to time out in the driver.

betzw commented 7 years ago

Well, currently persistent CCA is enabled in the Spirti1 driver. Furthermore, it never reports a CCA_FAIL ... which might be a problem?!?

kjbracey commented 7 years ago

At first impressions persistent CCA sounds like a bad idea to me in a busy network, because the PHY has no collision detection, only collision avoidance. And the collision avoidance only works if nodes don't tend to transmit simultaneously. Persistent CCA seems like it would encourage simultaneous transmissions.

It seems like a fudge to try to get higher bandwidth and lower latency, at the cost of higher power and working less well with multiple nodes.

It's not standard 802.15.4 and I would suggest disabling it.

rspelta commented 7 years ago

I tried the ping test. I started with only one client connected to the mesh. I have seen that even with one node sometimes there is a problem. My test is:

  1. turn on the border-router, then the client
  2. once the client is connected to the mbed connector I start to pinging the client continuely. 10 minutes with 32B lenght,10 minutes with 1000B lenght and other 10 minutes with 32B lenght.

What I have seen is that when all works correctly I have a ping 32B in 30ms and a ping 1000B reply in 330ms with no packets lost. If the timing rise up to 2-3 seconds then the client will starts to lose packets reaching 25%-35% packets lost. If I reset the border router (not the client) then it return to work well with no packet lost and timing 30ms-330ms.

here the worst-case: ping: https://pastebin.com/9LTGNUQQ router: https://pastebin.com/9EyXra9A wireshark logs via dropbox

betzw commented 7 years ago

Ciao Roberto, which version(s) of the Spirti1 driver did you use? And could you pls. elaborate a bit about what you mean when saying:

the timing rise up to 2-3 seconds

rspelta commented 7 years ago

I used the spirit driver commit with "5 max nr of backoffs" (140e6470983229aebac2de7256616c3c13f37c4b). My PC for pinging is connected via eth cable to the same router with the border router.

When I start both (border router + client) all is ok. If I ping the client the response time is immediate. packets of 32B -> time response: 17-30ms packets of 1000B -> time response: 320-330ms If I continue to ping the client, after a while (time is random) the time response become strange: Every pings have differents times responses like these:

Risposta da 2002:4f15:d31f:e472:9b99:9999:9999:9999: durata=610ms Risposta da 2002:4f15:d31f:e472:9b99:9999:9999:9999: durata=2002ms Risposta da 2002:4f15:d31f:e472:9b99:9999:9999:9999: durata=3005ms Risposta da 2002:4f15:d31f:e472:9b99:9999:9999:9999: durata=699ms Risposta da 2002:4f15:d31f:e472:9b99:9999:9999:9999: durata=17ms Risposta da 2002:4f15:d31f:e472:9b99:9999:9999:9999: durata=1002ms Risposta da 2002:4f15:d31f:e472:9b99:9999:9999:9999: durata=2405ms Risposta da 2002:4f15:d31f:e472:9b99:9999:9999:9999: durata=597ms

After a while (time random) the client will start to losing packets about 25%-35%. If I reset the client this doesn't change the situation. If I reset the border router the normal condition is restored.

note: time is random but if there are more clients connected then this issue happen before.

kjbracey commented 7 years ago

Just to observe, seeing your comment about the commit "5 max nr of backoffs" - I don't think the number of backoffs parameter is used if in persistent mode - there are no backoffs, so I wouldn't expect that particular patch to change anything.

I still think the most important thing to try here is getting back to normal non-persistent 802.15.4 with backoffs, unless there is some reason to believe that doesn't work with this chipset.

But with reference to the test above - to pin it down can you do the same test pinging the border router itself? Both when it's the only device, and when there are clients attached. Also, after the client pinging starts going weird, what do pings to the border router do?

rspelta commented 7 years ago

when it's the only router:

C:\Users\Roberto>ping -6 -t -l 32 2002:5709:48f6:e472:280:e1ff:fe23:39

Esecuzione di Ping 2002:5709:48f6:e472:280:e1ff:fe23:39 con 32 byte di dati: Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms

Statistiche Ping per 2002:5709:48f6:e472:280:e1ff:fe23:39: Pacchetti: Trasmessi = 28, Ricevuti = 28, Persi = 0 (0% persi), Tempo approssimativo percorsi andata/ritorno in millisecondi: Minimo = 0ms, Massimo = 0ms, Medio = 0ms`

C:\Users\Roberto>ping -6 -t -l 1000 2002:5709:48f6:e472:280:e1ff:fe23:39

Esecuzione di Ping 2002:5709:48f6:e472:280:e1ff:fe23:39 con 1000 byte di dati: Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms

Statistiche Ping per 2002:5709:48f6:e472:280:e1ff:fe23:39: Pacchetti: Trasmessi = 15, Ricevuti = 15, Persi = 0 (0% persi), Tempo approssimativo percorsi andata/ritorno in millisecondi: Minimo = 0ms, Massimo = 0ms, Medio = 0ms

with 1 client attached: CLIENT C:\Users\Roberto>ping -6 -t -l 32 2002:5709:48f6:e472:9b99:9999:9999:9999

Esecuzione di Ping 2002:5709:48f6:e472:9b99:9999:9999:9999 con 32 byte di dati: Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=21ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=18ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=18ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=18ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=18ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=20ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=18ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=18ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=18ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=19ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=17ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=18ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=18ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=18ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=19ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=18ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=17ms

Statistiche Ping per 2002:5709:48f6:e472:9b99:9999:9999:9999: Pacchetti: Trasmessi = 17, Ricevuti = 17, Persi = 0 (0% persi), Tempo approssimativo percorsi andata/ritorno in millisecondi: Minimo = 17ms, Massimo = 21ms, Medio = 18ms Control-C

C:\Users\Roberto>ping -6 -t -l 1000 2002:5709:48f6:e472:9b99:9999:9999:9999

Esecuzione di Ping 2002:5709:48f6:e472:9b99:9999:9999:9999 con 1000 byte di dati: Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=334ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=334ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=342ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=335ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=341ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=335ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=338ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=332ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=337ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=340ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=336ms

Statistiche Ping per 2002:5709:48f6:e472:9b99:9999:9999:9999: Pacchetti: Trasmessi = 11, Ricevuti = 11, Persi = 0 (0% persi), Tempo approssimativo percorsi andata/ritorno in millisecondi: Minimo = 332ms, Massimo = 342ms, Medio = 336ms

BORDER ROUTER: Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms

When I connected the second client, ping starts going weird: BORDER-ROUTER: Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Richiesta scaduta. Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=46ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=10ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=165ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=42ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=142ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=468ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=38ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=79ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=91ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=33ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=409ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=760ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=149ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=406ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=412ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=140ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=716ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=238ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=34ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=52ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=343ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=135ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=137ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=123ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=29ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=288ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=343ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=343ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=173ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=338ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=344ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=795ms

CLIENT: Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=668ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=544ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=528ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=856ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=1006ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=890ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=999ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=1007ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=384ms Richiesta scaduta. Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=587ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=3038ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=1644ms

after 10 minutes BORDER-ROUTER: C:\Users\Roberto>ping -6 -t -l 1000 2002:5709:48f6:e472:280:e1ff:fe23:39

Esecuzione di Ping 2002:5709:48f6:e472:280:e1ff:fe23:39 con 1000 byte di dati: Richiesta scaduta. Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=1555ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=1252ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=3423ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=987ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=1002ms Richiesta scaduta. Richiesta scaduta. Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=1337ms Richiesta scaduta. Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata=3270ms

Statistiche Ping per 2002:5709:48f6:e472:280:e1ff:fe23:39: Pacchetti: Trasmessi = 11, Ricevuti = 7, Persi = 4 (36% persi), Tempo approssimativo percorsi andata/ritorno in millisecondi: Minimo = 987ms, Massimo = 3423ms, Medio = 1832ms

CLIENT C:\Users\Roberto>ping -6 -t -l 32 2002:5709:48f6:e472:9b99:9999:9999:9999

Esecuzione di Ping 2002:5709:48f6:e472:9b99:9999:9999:9999 con 32 byte di dati: Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=3067ms Richiesta scaduta. Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=245ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=1798ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=2345ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=2398ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=1252ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=3422ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=752ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=762ms Richiesta scaduta. Richiesta scaduta. Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=1346ms Richiesta scaduta. Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=3268ms Richiesta scaduta. Richiesta scaduta. Richiesta scaduta. Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=3304ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=3004ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=1252ms Richiesta scaduta. Richiesta scaduta. Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=3851ms Richiesta scaduta.

Statistiche Ping per 2002:5709:48f6:e472:9b99:9999:9999:9999: Pacchetti: Trasmessi = 25, Ricevuti = 15, Persi = 10 (40% persi), Tempo approssimativo percorsi andata/ritorno in millisecondi: Minimo = 245ms, Massimo = 3851ms, Medio = 2137ms

TRACE BORDER ROUTER [INFO][app ]: Starting NanoStack Border Router... [INFO][app ]: Build date: Jul 6 2017 11:42:03 [INFO][app ]: Using ETH backhaul driver... [ERR ][brro]: Backhaul interface down failed [INFO][Eth ]: Ethernet cable connected. [INFO][addr]: Tentative Address added to IF 2: fe80::280:e1ff:fe23:39 [INFO][addr]: DAD passed on IF 2: fe80::280:e1ff:fe23:39 [INFO][addr]: Tentative Address added to IF 2: 2002:5709:48f6:e472:280:e1ff:fe23:39 [INFO][icmp]: Route: 0:0:0:e472::/64 Lifetime: 1800 Pref: 0 [INFO][addr]: DAD passed on IF 2: [INFO][brro]: Backhaul bootstrap ready, IPv6 = 2002:5709:48f6:e472:280:e1ff:fe23:39 [INFO][brro]: Backhaul interface addresses: [INFO][brro]: [0] fe80::280:e1ff:fe23:39 [INFO][brro]: [1] 2002:5709:48f6:e472:280:e1ff:fe23:39 [INFO][brro]: RF channel: 1 [INFO][br ]: BR nwk base ready for start [INFO][addr]: Address added to IF 1: fe80::ff:fe00:face [INFO][br ]: Refresh Contexts [INFO][br ]: Refresh Prefixs [INFO][addr]: Address ad [INFO][addr]: Address added to IF 1: fe80::f2f1:f2f3:f4f5:f6f7 [WARN][ip6r]: LL addr of 2002:5709:48f6:e472:9b99:9999:9999:9999 not found [WARN][ip6r]: LL addr of not found [WARN][ip6r]: LL addr of not found [INFO][brro]: RF bootstrap ready, IPv6 = 2002:5709:48f6:e472:0:ff:fe00:face [INFO][brro]: RF interface addresses: [INFO][brro]: [0] fe80::ff:fe00:face [INFO][brro]: [1] fe80::f2f1:f2f3:f4f5:f6f7 [INFO][brro]: [2] 2002:5709:48f6:e472:0:ff:fe00:face [INFO][brro]: 6LoWPAN Border Router Bootstrap Complete. [WARN][ip6r]: LL addr of not found ------------------no problem, no client, now I'm turning on the first client ---------------------- [INFO][rplc]: DAO from [INFO][rplc]: Transmit DAO-ACK to: 2002:5709:48f6:e472:9b99:9999:9999:9999 [INFO][rplc]: DIO from fe80::9b99:9999:9999:9999, rank 100 [INFO][rplc]: DIO from , rank 100 [INFO][6Bo ]: Received MLE Advertisement from [INFO][rplc]: DIO from , rank 100 [INFO][6Bo ]: Received MLE Advertisement from [INFO][rplc]: DIO from , rank 100 [INFO][6Bo ]: Received MLE Advertisement from [INFO][rplc]: DIO from , rank 100 [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][icmp]: Route: Lifetime: 1800 Pref: 0 [INFO][6Bo ]: Received MLE Advertisement from [INFO][rplc]: DIO from , rank 100 [INFO][rplc]: DAO from ----------- now I turning on the second client ---------------------------- [INFO][rplc]: Transmit DAO-ACK to: 2002:5709:48f6:e472:3133:3333:3333:3333 [INFO][rplc]: DIO from fe80::9b99:9999:9999:9999, rank 100 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [INFO][rplc]: DIO from fe80::3133:3333:3333:3333, rank 100 [INFO][rplc]: DIO from , rank 100 [INFO][rplc]: DIO from fe80::9b99:9999:9999:9999, rank 100 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [INFO][6Bo ]: Received MLE Advertisement from [INFO][rplc]: DIO from fe80::9b99:9999:9999:9999, rank 102 [INFO][rplc]: DIO from fe80::3133:3333:3333:3333, rank 100 [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [INFO][rplc]: DIO from , rank 17b [INFO][rplc]: DIO from fe80::3133:3333:3333:3333, rank 117 [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333 [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333 [INFO][rplc]: DIO from fe80::9b99:9999:9999:9999, rank 100 [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][rplc]: DIO from fe80::3133:3333:3333:3333, rank 118 [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333 [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333 [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][6Bo ]: Received MLE Advertisement from [INFO][rplc]: DIO from fe80::9b99:9999:9999:9999, rank 100 [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333 [INFO][rplc]: DIO from fe80::3133:3333:3333:3333, rank 10b [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333 [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333 [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][icmp]: Route: 0:0:0:e472::/64 Lifetime: 1800 Pref: 0 [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333 [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333 [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][br ]: Release Context [INFO][br ]: Refresh Contexts [INFO][br ]: Refresh Prefixs [INFO][br ]: ABRO Update and NVM operation OK [INFO][6Bo ]: Received MLE Advertisement from [INFO][rplc]: DIO from fe80::9b99:9999:9999:9999, rank 100 [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333 [INFO][rplc]: DIO from , rank 100 ----------- after 10 minutes -------------------- [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333 [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333 [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333 [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333

Even with this problem on the mbed device connector I can see both clients connected

rspelta commented 7 years ago

PS. Resetting the router all returns back to normal:

BORDER-ROUTER: Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms Risposta da 2002:5709:48f6:e472:280:e1ff:fe23:39: durata<1ms ...

CLIENT Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=21ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=18ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=18ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=18ms Risposta da 2002:5709:48f6:e472:9b99:9999:9999:9999: durata=18ms

Both clients re-connected to the mbed connector successfully

kjbracey commented 7 years ago

Weird. What if you turn off both clients after it's gone funny? Does the border router ever go back to normal?

rspelta commented 7 years ago

If I turn off both clients the border router doesn't go back to normal

Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata<1ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=2360ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=2416ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=582ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=2613ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=3002ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=3002ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=1267ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata<1ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=726ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=33ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=1704ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=27ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=2012ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=12ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=963ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=950ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=1316ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=1665ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=3021ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=3002ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=744ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=42ms Risposta da 2002:4f33:7063:e472:0:ff:fe00:face: durata=992ms

kjbracey commented 7 years ago

Can you intrusively debug the router - stop it to see where it's spending its time? Might not work if it's sleeping rather than busy.

Alternatively, as I'm guessing whoever's taking time is probably doing it in Nanostack's event loop, you could add trace to it, to try to see which event handler is consuming time. Add trace to eventOS_scheduler_dispatch_event(), to print out event ID and type before calling the function pointer, and see if you can visually spot any big delays. (There should be a regular event 10 times a second, so any excessively long-running handlers should obviously block that).

Or maybe flag up automatically if the function pointer takes an unusually long time - measure time with eventOS_event_timer_ticks(), and report any handlers that take more than 500ms (50 ticks).

JanneKiiskila commented 7 years ago

@kjbracey-arm @hasnainvirk - should/could the eventloop actually have a WARN printout, if it spots any event that uses more than an acceptable amount of time?

SeppoTakalo commented 7 years ago

@JanneKiiskila That is kind of general problem that is already under discussion here. We could add it, but requires that the common event loops will be accepted into the mbed OS first.

hasnainvirk commented 7 years ago

@JanneKiiskila You can add trace for your debug purposes.

kjbracey commented 7 years ago

Would be nice to have some warning permanently in there, at least in debug builds.

We don't have a high-precision wallclock we can read in the standard HAL (ping in the cliapp has to cheat with platform-specific stuff as a result), but there is a 10ms interrupt ticker timer we can look at (eventOS_event_timer_ticks/timer_sys_ticks), which is enough for this purpose.

rspelta commented 7 years ago

Sorry about being late, I added this code in eventOS_scheduler_dispatch_event:

    uint32_t tick = eventOS_event_timer_ticks();

    /* Tasklet Scheduler Call */
    tasklet->func_ptr(&cur_event->data);
    event_core_free_push(cur_event);

    tick = eventOS_event_timer_ticks() - tick;
    if( tick > 50 )
        printf("id: 0x%02x - type: 0x%02x - tick: %d\n", tasklet->id, cur_event->data.event_type, tick );

I done tests with 1 or more clients but during the ping problem like:

Risposta da 2002:4f1f:759c:e472:280:e1ff:fe23:39: durata=2898ms Risposta da 2002:4f1f:759c:e472:280:e1ff:fe23:39: durata=1241ms Risposta da 2002:4f1f:759c:e472:280:e1ff:fe23:39: durata=3187ms Risposta da 2002:4f1f:759c:e472:280:e1ff:fe23:39: durata=8ms Risposta da 2002:4f1f:759c:e472:280:e1ff:fe23:39: durata=423ms Richiesta scaduta. Risposta da 2002:4f1f:759c:e472:280:e1ff:fe23:39: durata=3638ms Richiesta scaduta. Risposta da 2002:4f1f:759c:e472:280:e1ff:fe23:39: durata=1149ms Richiesta scaduta. Richiesta scaduta. Risposta da 2002:4f1f:759c:e472:280:e1ff:fe23:39: durata=3807ms Risposta da 2002:4f1f:759c:e472:280:e1ff:fe23:39: durata=8ms Risposta da 2002:4f1f:759c:e472:280:e1ff:fe23:39: durata=3382ms Richiesta scaduta. Richiesta scaduta. Risposta da 2002:4f1f:759c:e472:280:e1ff:fe23:39: durata=1321ms Richiesta scaduta.

The trace was always normal, only at the beginning I had the message "id: 0x00 - type: 0x02 - tick: 160":

[INFO][brro]: PANID: 691

[INFO][app ]: Starting NanoStack Border Router... [INFO][app ]: Build date: Jul 17 2017 16:02:08 [INFO][app ]: Using ETH backhaul driver... id: 0x00 - type: 0x02 - tick: 160 [ERR ][brro]: Backhaul interface down failed [INFO][Eth ]: Ethernet cable connected. [INFO][addr]: Tentative Address added to IF 2: fe80::280:e1ff:fe23:39 [INFO][addr]: DAD passed on IF 2: fe80::280:e1ff:fe23:39 [INFO][addr]: Tentative Address added to IF 2: 2002:4f1f:759c:e472:280:e1ff:fe23:39 [INFO][icmp]: Route: 0:0:0:e472::/64 Lifetime: 1800 Pref: 0 [INFO][addr]: DAD passed on IF 2: 2002:4f1f:759c:e472:280:e1ff:fe23:39 [INFO][brro]: Backhaul bootstrap ready, IPv6 = 2002:4f1f:759c:e472:280:e1ff:fe23:39 [INFO][brro]: Backhaul interface addresses: [INFO][brro]: [0] fe80::280:e1ff:fe23:39 [INFO][brro]: [1] 2002:4f1f:759c:e472:280:e1ff:fe23:39 [INFO][brro]: RF channel: 1 [INFO][br ]: BR nwk base ready for start [INFO][brro]: RF bootstrap ready, IPv6 = 2002:4f1f:759c:e472:0:ff:fe00:face [INFO][brro]: RF interface addresses: [INFO][brro]: [0] fe80::ff:fe00:face [INFO][brro]: [1] fe80::f2f1:f2f3:f4f5:f6f7 [INFO][brro]: [2] 2002:4f1f:759c:e472:0:ff:fe00:face [INFO][brro]: 6LoWPAN Border Router Bootstrap Complete. [INFO][rplc]: DAO from [INFO][rplc]: Transmit DAO-ACK to: 2002:4f1f:759c:e472:9b99:9999:9999:9999 [INFO][rplc]: DIO from fe80::9b99:9999:9999:9999, rank 100 [INFO][rplc]: DIO from fe80::9b99:9999:9999:9999, rank 100 [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][rplc]: DIO from fe80::9b99:9999:9999:9999, rank 100 [INFO][6Bo ]: Received MLE Advertisement from [INFO][rplc]: DIO from fe80::9b99:9999:9999:9999, rank 100 [INFO][6Bo ]: Received MLE Advertisement from [INFO][icmp]: Route: Lifetime: 1800 Pref: 0 [INFO][rplc]: DIO from , rank 100 [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][rplc]: DIO from , rank 100 [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][rplc]: DIO from , rank 100 [INFO][6Bo ]: Received MLE Advertisement from [INFO][icmp]: Route: Lifetime: 1800 Pref: 0 [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][icmp]: Route: 0:0:0:e472::/64 Lifetime: 1800 Pref: 0 [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][icmp]: Route: Lifetime: 1800 Pref: 0 [INFO][6Bo ]: Received MLE Advertisement from [INFO][icmp]: Route: Lifetime: 1800 Pref: 0 [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][br ]: Release Context [INFO][br ]: Refresh Contexts [INFO][br ]: Refresh Prefixs [INFO][br ]: ABRO Update and NVM operation OK [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][icmp]: Route: Lifetime: 1800 Pref: 0 [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][rplc]: DIO from fe80::9b99:9999:9999:9999, rank 100 [INFO][icmp]: Route: 0:0:0:e472::/64 Lifetime: 1800 Pref: 0 [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][icmp]: Route: Lifetime: 1800 Pref: 0 [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][icmp]: Route: Lifetime: 1800 Pref: 0 [INFO][6Bo ]: Received MLE Advertisement from fe80::9b99:9999:9999:9999 [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from

Just when I added more clients I had some TX fails:

[INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333 [INFO][rplc]: DAO from 2002:4f1f:759c:e472:1311:1111:1111:1111 [INFO][rplc]: Transmit DAO-ACK to: 2002:4f1f:759c:e472:1311:1111:1111:1111 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [INFO][rplc]: DAO from 2002:4f1f:759c:e472:1311:1111:1111:1111 [INFO][rplc]: Transmit DAO-ACK to: 2002:4f1f:759c:e472:1311:1111:1111:1111 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [INFO][rplc]: DAO from 2002:4f1f:759c:e472:1311:1111:1111:1111 [INFO][rplc]: Transmit DAO-ACK to: 2002:4f1f:759c:e472:1311:1111:1111:1111 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [INFO][6Bo ]: Received MLE Advertisement from fe80::2022:2222:2222:2222 [INFO][rplc]: DAO from [INFO][rplc]: Transmit DAO-ACK to: 2002:4f1f:759c:e472:1311:1111:1111:1111 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [INFO][6Bo ]: Received MLE Advertisement from fe80::4644:4444:4444:4444 [ERR ][mMCp]: MAC tx fail [ERR ][6lAd]: MCPS Data fail by status 233 [INFO][6Bo ]: Received MLE Advertisement from fe80::1311:1111:1111:1111 [INFO][6Bo ]: Received MLE Advertisement from [INFO][rplc]: DAO from [INFO][rplc]: Transmit DAO-ACK to: 2002:4f1f:759c:e472:1311:1111:1111:1111 [INFO][6Bo ]: Received MLE Advertisement from fe80::2022:2222:2222:2222 [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from fe80::1311:1111:1111:1111 [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333 [INFO][6Bo ]: Received MLE Advertisement from fe80::2022:2222:2222:2222 [INFO][6Bo ]: Received MLE Advertisement from [INFO][6Bo ]: Received MLE Advertisement from fe80::1311:1111:1111:1111 [INFO][6Bo ]: Received MLE Advertisement from fe80::3133:3333:3333:3333

EDIT: version of the nanostack-border-router with lastest commits:

nanostack-border-router (9d0645907ab7) |- drivers\TARGET_MCUXpresso_MCUS\sal-nanostack-driver-k64f-eth (061aac878ce0) |- drivers\TARGET_NUCLEO_F429ZI\sal-nanostack-driver-stm32-eth (e1502953a3b1) |- drivers\atmel-rf-driver (0ae76ce17ae5) |- drivers\mcr20a-rf-driver (d5905fefa54c) |- drivers\sal-stack-nanostack-slip (4a600894da85) |- drivers\stm-spirit1-rf-driver (eb7a87e9a8dc) `- mbed-os (8828635da469)

betzw commented 7 years ago

Currently, the nanostack-border-router and the mbed-os-example-client do not work correctly with mbed-os-5.5 in non-debug builds. This PR should resolve the issue.

hasnainvirk commented 7 years ago

@rspelta I would like to share an observation, might not be related to your issue. You are using channel 1 which falls right at the middle of WLAN channel 6. Could you be encountering fierce interference in the radio channel ?. Imagine tens of devices and a border router competing with high powered WLAN. It might be worth a try to look from that angle.

SeppoTakalo commented 7 years ago

Nope, RF channel 1 is sub Ghz. It is in 900 Mhz area. 15.4 channels

Then for overlapping Wifi channels, please see this 15.4 and .11 channels

hasnainvirk commented 7 years ago

@SeppoTakalo @rspelta Sorry for that. I read it as 11 although I typed 1 🗡 ..My bad In case of sub-GHz bands, we must consider saturation of channel. 868-900 MHz channels are known to get saturated rather quickly.

kjbracey commented 7 years ago

Does PR#4768 totally resolve the weird border router delay issues?

betzw commented 7 years ago

PR #4768 isn't really related to this issue, so it would be a wonder if it would resolve it ...

kjbracey commented 7 years ago

Ah, you linked to it above, so I thought you were saying it was the border router problem. Seemed to me the symptoms might fit - failure to wake at the correct time.

betzw commented 7 years ago

Don't think so. PR #4768 is about a situation where you never return from a timeout rather than returning at an incorrect time.

betzw commented 7 years ago

Maybe I have found a quite deterministic way to reproduce increasing values for ping6.

Setup:

Procedure:

  1. Run ping6 -s 1016 2001:470:1f13:280:0:ff:fe00:face -i 10 & ping6 -s 1016 2001:470:1f13:280:201:203:405:607 -i 10 in two different shells
  2. Start (plug in USB cable into PC) BR & wait for startup complete (6LoWPAN interface ready)

  3. Start (plug in USB cable into PC) Client & wait for object registered
  4. Plug off USB cable of Client from PC & wait ~30secs
  5. Goto 3.

After four/five re-starts of the client I get the following ping6 times:

See also attached screenshot: screenshot from 2017-07-19 08-48-12

betzw commented 7 years ago

Another interesting observation I made this morning after a long run of the above describe setup during last night was that this morning the client was no longer listed by the mbed connector under Connected Devices, while its output trace didn't show up any suspicious output, on the contrary the output continued as usual with:

...

simulate button_click, new value of counter is 1134
[INFO][6Bo ]: Received MLE Advertisement from 
simulate button_click, new value of counter is 1135
simulate button_click, new value of counter is 1136
simulate button_click, new value of counter is 1137
[INFO][6Bo ]: Received MLE Advertisement from 
simulate button_click, new value of counter is 1138

Any ideas?

cc @nikapov-ST

architech-boards commented 7 years ago

@betzw Maybe I have found a quite deterministic way to reproduce increasing values for ping6. I tryed your way and it works. I have the problem also in this way.

@kjbracey-arm Does PR#4768 totally resolve the weird border router delay issues? No, I can compile without --profile=debug but this issue is not resolved

With Atmel is there this problem? Is it a good idea if someone could try with Atmel in order to see if this problem happens? I don't have the ATMEL AT233 15.4 shield here.

rspelta commented 7 years ago

Hello, I sniffed the packets from the ethernet cable between the border router and the router connected to the internet (not by SIM card but via cable). When the border router is working well and the ping is fast and stable we have this situation when the client connects to the mbed connector: immagine The connection is always perfect.

When the border router is weird (ping problem) the log is this one: immagine In the end the client is connected temporally to the mbed connector.

I have the client trace of the picture above, I found out:

  1. The client didn't received 4 times the packet "SERVER HELLO DONE" (lenght 87bytes) even if you can see them in the picture above. See the trace (it is commented in order to explain the names of the packets): here. So there are many retries because the packets are lost.
  2. Sometimes packets received are not in the right order. For example "Encrypted Handshake Message" is received before of "Change Cipher Spec" packet. But I think this problem is due to the UDP protocol.
betzw commented 7 years ago

Just for my understanding:

rspelta commented 7 years ago

2002:5236:726c:e472:9b999:9999:9999:9999 -> client 2607:f0d0:2601:52:20 -> mbed connector (server) This is the DTLS protocol, the packets exchanged form the client to the mbed connector during the connection phase:

fig-2-dtls-key-agreement-with-fixed-diffie-hellman-over-elliptic-curves

These packets are sniffed between the internet router and the border router. The trace is only for the client.

betzw commented 7 years ago

Analyzig the trace, to me it seems as if the client misses sniff entries from no. 77 up to no. 143 (both included). Do you have any information if and what other traffic might be going on in this period?

rspelta commented 7 years ago

No, they aren't missed. Here the list of the packets from the trace and the sniffed packets. The missed packets are "Server Hello Done" '>' client output '<' client input

[no.] [input/outuput] [tls message (copied from the client trace)] 63 > CLIENT HELLO START 64 < HELLO VERIFY REQUEST 72 > CLIENT HELLO 73 < SERVER HELLO 74 < CERTIFICATE 75 < SERVER KEY EXCHANGE 76 < CERTIFICATE REQUEST 77 LOST 130 < SERVER HELLO 132 < SERVER KEY EXCHANGE 143 > CLIENT HELLO 133 < CERTIFICATE 135 < CERTIFICATE REQUEST 136 LOST 146 < SERVER HELLO 147 < SERVER KEY EXCHANGE 148 < CERTIFICATE 149 LOST 150 < CERTIFICATE REQUEST 194 > CLIENT HELLO 195 < SERVER HELLO 196 < SERVER KEY EXCHANGE 197 < CERTIFICATE 198 < CERTIFICATE REQUEST 199 LOST 212 < SERVER HELLO 213 < SERVER KEY EXCHANGE 214 < CERTIFICATE 215 < SERVER HELLO DONE 216 < CERTIFICATE REQUEST 218 < SERVER HELLO 219 < SERVER KEY EXCHANGE 220 < CERTIFICATE 221 < SERVER HELLO DONE 242 > CERTIFICATE 222 < CERTIFICATE REQUEST 245 > CLIENT KEY EXCHANGE 246 > CERTIFICATE VERIFY 247 > CHANGE CHIPER SPEC 248 > ENCRYPTED HANDSHAKE 251 > CERTIFICATE 255 > CLIENT KEY EXCHANGE 256 > CERTIFICATE VERIFY 257 > CHANGE CHIPER SPEC 259 > ENCRYPTED HANDSHAKE 252 < ENCRYPTED HANDSHAKE 253 < CHANGE CHIPER SPEC 260 < ENCRYPTED HANDSHAKE 264 > APPLICATION DATA 261 < CHANGE CHIPER SPEC 265 < APPLICATION DATA

rspelta commented 7 years ago

update: We resoved the problem of the packets "Server Hello Done" missed. There will be a patch from ST. There is still the problem of the ping.

betzw commented 7 years ago

You can find an updated version of the Ethernet Nanostack driver for NUCLEO_F429ZI here.

@rspelta could you pls. test this version of the driver on your side?

MarceloSalazar commented 7 years ago

@rspelta please let us know whether the issue is fixed now.

rspelta commented 7 years ago

Hi all, today I downloaded the last version of the "nanostack-border-router" with the last commit from the sal-nanostack-driver-stm32-eth and stm-spirit1-rf-driver repositories. It worked only with the option "--profile=debug" (old known problem) My test (repetuted 5 times) was:

  1. turn on border router connected via eth cable with internet router, waiting the boot is finished
  2. turn on contemporanely 20 clients
  3. after 10 minutes check on mbed device connector if all 20 clients are connected
  4. use API console via mbed device connector
  5. check ping timing of the border router
  6. turn off the clients and wait they disappear from the mbed device connector
  7. go to step 2 (without resetting the border router)

Results:

Note for @betzw: After this test I changed the source code of the sal-nanostack-driver-stm32-eth, I changed the size of the ring buffer TX and RX of the ethernet to 10. With this modify all 20 clients can connect always to the mbed device connector and with lesser time than before (about half time).

JanneKiiskila commented 7 years ago

Great success! 🥇

Should that modification be passed on as a PR to sal-nanostack-driver-stm32-eth (and merged in)?