arduino-libraries / MKRNB

36 stars 43 forks source link

Apparent complete crash after network side TCP close #14

Open pekkanikander opened 5 years ago

pekkanikander commented 5 years ago

We have some networking problems with out test 5G network, basically causing the network to close TCP connections more or less randomly. This causes the MKR NB 1500 to completely crash, including the USB host connection, until reset. As we don't have any SWD debugging capability (as of yet), it is very hard to pinpoint the real cause.

Setup

Libraries: PubSubClient, MKRNB Sketch: https://github.com/iothon/docker-compose-mqtt-influxdb-grafana/tree/master/03-arduino_mqtt/mkr_nb_1500_mqtt

To repeat

  1. Have a NB-IoT SIM card, modify and flash the sketch (from above), wait until the sketch connects to your MQTT server.

  2. Cause a network side TCP connection termination, e.g. pull the Ethernet from your server, kill your Mosquitto, or something similar. Sometimes a regular TCP close also seems to cause the same problem.

Symptoms

In NBClient.cpp, if the AT+USORD=0,512 call, used to receive incoming packets, returns +CEM ERROR: ..., this causes the NBClient to close the socket, due to PubSubClient calling _client->stop().

Some time after the NBClient::stop() method has returned, the sketch crashes so badly that the virtual TTY is no longer visible through USB. The board seems to stay in that stage until it is reset with the reset button, or if it is disconnected, it runs out of battery.

sandeepmistry commented 5 years ago

Hi @pekkanikander,

Have you tried to reproduce this issue without any other libraries in the sketch? Ideally if we had something that just used the MKRNB lib it would be easier to isolate.

sandeepmistry commented 5 years ago

Would you also be able to share a debug log?

pekkanikander commented 5 years ago

Sorry, I've pretty much run out of time with this. (We had https://iothon.io, where we used MKR NB 1500 boards. But that is over now.)

What we managed to do is a working SWD interface. However, I didn't have time to get gdb working on my Mac OS X. If you can point out instructions to that, I can try to take a few stack traces still.

Below some debug logs with our MKRNB fork.

16:33:27.408 -> '
16:33:27.442 -> MODEM: poll: resp='+USORD: 0,""\013
16:33:27.442 -> \013
16:33:27.442 -> OK\013
16:33:27.442 -> '
16:33:27.442 -> MODEM: send: data='AT+USORD=0,512'
16:33:27.442 -> MODEM: poll:   at='AT+USORD=0,512\013\013
16:33:27.442 -> '
16:33:27.442 -> MODEM: poll: resp='+USORD: 0,""\013
16:33:27.442 -> \013
16:33:27.442 -> OK\013
16:33:27.442 -> '
16:33:28.462 -> MODEM: send: data='AT+USORD=0,512'
16:33:28.462 -> MODEM: poll:   at='AT+USORD=0,512\013\013
16:33:28.462 -> '
16:33:28.462 -> MODEM: poll: resp='+CME ERROR: Operation not allowed\013
16:33:28.462 -> '
16:33:28.462 -> NBClient: Closing the socket (may take several minutes)...
16:33:28.462 -> MODEM: poll:  urc='\013
16:33:28.462 -> '
16:33:28.497 -> MODEM: send: data='AT+USOCL=0'
16:33:28.497 -> MODEM: poll:   at='AT+USOCL=0\013\013
16:33:28.497 -> '
16:33:28.497 -> MODEM: poll: resp='+CME ERROR: Operation not allowed\013
16:33:28.497 -> '
16:33:28.497 -> NBClient: Closing the socket: closed.

---

17:13:36.239 -> NBClient: Closing the socket (may take several minutes)...
17:13:36.276 -> MODEM: send: data='AT+USOCL=0'
17:15:31.648 -> MODEM: poll:   at='AT+USOCL=0\013\013
17:15:31.648 -> '
17:15:31.648 -> MODEM: poll: resp='OK\013
17:15:31.648 -> '
17:15:31.648 -> NBClient: Closing the socket: closed.
17:15:31.648 -> Connecting to the MQTT server...\013
17:15:31.648 -> MODEM: send: data='AT+USOCR=6'

---

MODEM: send: data='AT+USORD=0,512'
MODEM: poll:   at='AT+USORD=0,512
'
MODEM: poll: resp='+CME ERROR: Operation not allowed
'
NBClient: Closing the socket (may take several minutes)...
MODEM: poll:  urc='
'
MODEM: send: data='AT+USOCL=0'
MODEM: poll:   at='AT+USOCL=0
'
MODEM: poll: resp='+CME ERROR: Operation not allowed
'
NBClient: Closing the socket: closed.

The middle trace is a rare one. There the stack managed to continue opening a new socket before crashing. In all other cases observed the Arduino crashed shortly after the socket got closed.

sandeepmistry commented 5 years ago

Hi @pekkanikander,

I managed to find a MQTT broker I could start/stop/kill and tried your example sketch with MKRNB v1.3.0 (released today) and PubSubClient v2.7.0. All looks good after I stop mosquito or kill -9 the mosquito process, I've attach the output from the serial monitor for both scenarios.

MKRNB_issue14.txt MKRNB_issue14-2.txt