TheThingsProducts / gateway

The Things Kickstarter Gateway
https://www.thethingsindustries.com/docs/gateways/models/thethingskickstartergateway
55 stars 20 forks source link

Reboot Loop - Reboot reason: 0x10 or 0x13 #1

Closed htdvisser closed 6 years ago

htdvisser commented 6 years ago

From what I can see in the source code, that's a RESET_REASON_WDT_TIMEOUT (0x10).

Based on what I see on the forum this happens a lot.

beamzer commented 6 years ago

It tells you at the bottom of the comment box, just attach the log file by dragging & dropping it in the comment box

smbunn commented 6 years ago

So you save the log file as a text file, then drag and drop it in. Makes sense, I thought maybe it was pasted into the message with some form of mark-up, like the inline code markers

PhilipOlsson commented 6 years ago

I can confirm that my newly received gateway had the endless reboot issue where it would not come up after these lines on the console:

SNTP: State change from 0 to 0 SNTP: State change from 0 to 0

After reseating the microchip lora modem, it is now working correctly.

KrishnaIyer commented 6 years ago

@PhilipOlsson: Which version of the firmware are you using? 1.0.2? Also, could you explain this:

reseating the microchip lora modem

Thanks!

PhilipOlsson commented 6 years ago

@KrishnaIyer

Both shipping ( 1.0.0 ), 1.0.2 and 1.0.4. Same behaviour, currently on 1.0.4.

Reseating the loramodem mean taking the ""Microchip Lora 868/915 Gateway Module" as displayed on https://github.com/TheThingsProducts/gateway/blob/develop/doc/header.png

out of its socket and into its socket.

smbunn commented 6 years ago

I found that removing the black foam insulation pad behind the LoRa board vastly improved reliability. I think it is bending the board out of alignment with the socket.

metaneutrons commented 6 years ago

I can confirm that removing the LoRa board (and the foam) and using some nylon screws with proper space holders fixed the problem. The gateway is now up and running.

smbunn commented 6 years ago

Can you provide details on what you did, i.e. how many spacers is optimum to keep the board aligned with the socket? Any photos?

jonathanve commented 6 years ago

I am suffering the reboot problem using the 915MHz TTN GW, even after upgrading the firmware:

CNFG: Load online user config state change to 7
CNFG: Configuring LoRa module
LORA: Changing state from 2 to 4
LORA: Starting reconfiguration
MON: SYS Stack size: 2837
MON: TCPIP Stack size: 3815
MON: APP Stack size: 3294
MON: LoRa Stack size: 3859
MON: heap usage: 168KB (250KB), free: 171KB
SNTP: State change from 0 to 0
SNTP: State change from 0 to 0

**************************
*   The Things Network   *
*      G A T E W A Y     *
**************************
Firmware name: AmazingAckermann, type: 0, version: 1.0.4, commit: a7beae91, timestamp: 1525259181
Bootloader revision: 1, commit: 7167873a, timestamp: 1496411298
Build time: May  2 2018 11:07:39
Reboot reason: 0x10
BOOT: (persisted info) 6F 72 72 65 01 03 30 C9 6E E4 FD F8 B0 28 C1 2C 

[Repeat]
beamzer commented 6 years ago

Tried removing the foam and the standoff clips, as mentioned above and replaced them with spacers so that the LoRa modem board is supported but doesn't get bend anymore. But it didn't do me any good, the reboot seems to be gone, but it still doesn't work.

I was Kickstarter backer number 26 back in 2015 when the campaign started, and had hoped to become a part in the Things Network, since i believe in the principles. So i waited for more than two years for the gateway to arrive, only to find out that it didn't work :-( After half a year i have little hope left that this is going to be fixed. A bit more support from TTN would have been nice, i get better support for the stuff i buy from China than from a company only a couple of km's away.

serial_log_v.1.0.3.txt

grahamehorner commented 6 years ago

@johanstokking good luck with getting anything back or anywhere with this; I personally have given up on the public LoRaWAN vision that was presented via Kickstarter, and I've started seeking legal advice on how to start a claim against TTN for release NOT FIT for purpose goods and the lack of support is simply a JOKE on all fronts. After as you quite rightly point out +2yrs of waiting and many many months of trying to get some real support; I think it time the team face up to this a start issues a recall/refund to those that have NON functioning gateways. So much promised, So much Hype, and failure to deliver on so many fronts.

smbunn commented 6 years ago

Being a kickstarter project I would think there is very little recourse. It is always assumed that there will be some risk with start up projects and as a backer you are expected to understand that risk when you invest. There has been a few noises around that they are still working on a fix, which implies firmware rather than hardware at fault.

etychon commented 6 years ago

As much as I'd like to get my money back too I agree with @smbunn .... you're not buying a product but you're funding development. Most of the time it works out very well, but some project get never shipped in the first place. At least we got some (junk) hardware.

The worst part being that no-one seem to care, there is no recognition of the problem, we are all left hanging without solution, communication, or even acknowledgment. For example it does not seem like we hear from Wienke Giezeman (@wienke) anymore, which is sad.

DefProc commented 6 years ago

I'd still recommend looking at the LoRa board before throwing everything away as a bad job. I've just photographed the board that I had an intermittent problem with, and the position of the board is quite subtle between it working fine, and just rebooting consistently.

It appears that although the board support has both standoffs, and sticky pad, the positional alignment is not perfectly secure (at least not on my unit). As I mentioned earlier in the thread, just dropping the GW was enough to dislodge it, and cause the reboot loop in a previously working gateway.

reboot_loop Click to embiggen

mchevroulet commented 6 years ago

Mine is strange. I think that it never worked but the stats on the Console tell me: Last Seen 4 months ago Received Messages 1468 Transmitted Messages 1

So it suggests that it connected once ?

smbunn commented 6 years ago

I would say that shows it connected at least once. Mine is intermittent, I once had a day where it ran for about 11 hours, but recently I have had 2 hours operation in the last 14 days. I am going to try removing the Lora board, remove the plastic offsets and use proper screw offsets I can adjust. I also note that the two black chips on the Lora board get very hot, something else that has been identified before. Maybe some stick on heatsinks could help? The image below shows every connection for the last 7 days as a vertical bar. image

jurrienjurrien commented 6 years ago

Now running the new firmware 1.0.5 for about 12 hours. Until now 4 reboots with reason 0x10 so looks like less reboots because on average the gateway reboots 20 times every 24 hours. The last reboots were all related to MQTT error.

Reboot at MQTT error ``` [2018-07-18 06:53:50] LGMD:LORA: Accepted packet [2018-07-18 06:53:50] [2018-07-18 06:53:50] MQTT: Sending UPLINK OK [2018-07-18 06:53:51] MON: SYS Stack size: 2851 [2018-07-18 06:53:51] MON: TCPIP Stack size: 3763 [2018-07-18 06:53:51] MON: APP Stack size: 3294 [2018-07-18 06:53:51] MON: LoRa Stack size: 3877 [2018-07-18 06:53:51] MON: heap usage: 278KB (279KB), free: 61KB [2018-07-18 06:53:52] MQTT: Sending status packet [2018-07-18 06:53:55] MQTT: Sending status failed [2018-07-18 06:53:55] [2018-07-18 06:53:55] MAIN: MQTT error [2018-07-18 06:53:55] [2018-07-18 06:53:55] MAIN: Leaving state 5 [2018-07-18 06:54:03] SNTP: State change from 0 to 0 [2018-07-18 06:54:03] SNTP: State change from 0 to 0 [2018-07-18 06:54:03] [2018-07-18 06:54:03] [2018-07-18 06:54:03] [2018-07-18 06:54:03] ************************** [2018-07-18 06:54:03] * The Things Network * [2018-07-18 06:54:03] * G A T E W A Y * [2018-07-18 06:54:03] ************************** [2018-07-18 06:54:03] Firmware name: AmazingAckermann, type: 0, version: 1.0.5, commit: fa89b993, timestamp: 1531815112 [2018-07-18 06:54:03] Bootloader revision: 2, commit: c463e87e, timestamp: 1519396960 [2018-07-18 06:54:04] Build time: Jul 17 2018 08:12:51 [2018-07-18 06:54:04] Reboot reason: 0x10 ```

However it doesn't reboot at every MQTT error like this event where the Wi-Fi connection is reinitialised and a new MQTT connection is established.

Recovery after MQTT error without reboot ``` [2018-07-18 04:15:47] MQTT: Sending status packet [2018-07-18 04:15:50] MQTT: Sending status failed [2018-07-18 04:15:50] [2018-07-18 04:15:50] MAIN: MQTT error [2018-07-18 04:15:50] [2018-07-18 04:15:50] MAIN: Leaving state 5 [2018-07-18 04:15:51] MAIN: Entering state 6 [2018-07-18 04:15:51] INET: State change to 0 [2018-07-18 04:15:51] WIFI: Disabling modules [2018-07-18 04:15:51] SNTP: State change from 7 to 8 [2018-07-18 04:15:51] CB: Disconnect [2018-07-18 04:15:51] Head magic match void: trying to free an already freed block, ignore [2018-07-18 04:15:51] SNTP: State change from 8 to 1 [2018-07-18 04:15:51] WIFI: Entering state 3 [2018-07-18 04:15:52] WIFI: Enabling modules for client [2018-07-18 04:15:52] WIFI: Entering state 6 [2018-07-18 04:15:52] WIFI: IP Address: 0.0.0.0 [2018-07-18 04:15:56] MON: SYS Stack size: 2851 [2018-07-18 04:15:56] MON: TCPIP Stack size: 3787 [2018-07-18 04:15:56] MON: APP Stack size: 3294 [2018-07-18 04:15:56] MON: LoRa Stack size: 3877 [2018-07-18 04:15:56] MON: heap usage: 183KB (279KB), free: 156KB [2018-07-18 04:15:59] CB: INET: Gateway has WiFi [2018-07-18 04:15:59] INET: State change to 2 [2018-07-18 04:15:59] INET: Connected to a network, waiting for DHCP lease, checking validity with ping [2018-07-18 04:15:59] SNTP: State change from 1 to 2 [2018-07-18 04:16:02] WIFI: IP Address: 192.168.60.23 [2018-07-18 04:16:02] LGMD:LORA: Accepted packet [2018-07-18 04:16:02] [2018-07-18 04:16:03] LGMD:LORA: Accepted packet [2018-07-18 04:16:03] [2018-07-18 04:16:04] SNTP: State change from 2 to 3 [2018-07-18 04:16:04] INET: State change to 3 [2018-07-18 04:16:04] INET: Ping probe [2018-07-18 04:16:04] INET: Error sending probe on Eth [2018-07-18 04:16:04] INET: Ping response from MRF24WN, set as default [2018-07-18 04:16:04] INET: State change to 5 [2018-07-18 04:16:04] [2018-07-18 04:16:04] MAIN: Leaving state 6 [2018-07-18 04:16:04] MAIN: Entering state 5 [2018-07-18 04:16:04] MQTT: GOT IP: 52.169.76.203 [2018-07-18 04:16:04] Connecting to: 52.169.76.203 [2018-07-18 04:16:05] MQTT: Connection Opened: Starting TLS Negotiation [2018-07-18 04:16:05] MQTT: Wait for SSL Connect [2018-07-18 04:16:05] SNTP: State change from 3 to 4 [2018-07-18 04:16:05] MQTT: TLS ready: Connect MQTT [2018-07-18 04:16:05] RQMQTT: Connected [2018-07-18 04:16:05] [2018-07-18 04:16:05] ************************* [2018-07-18 04:16:05] MAIN: Gateway bridging [2018-07-18 04:16:05] ************************* [2018-07-18 04:16:05] [2018-07-18 04:16:05] MQTT: Sending UPLINK OK [2018-07-18 04:16:05] MQTT: Sending UPLINK OK [2018-07-18 04:16:05] MON: SYS Stack size: 2851 [2018-07-18 04:16:05] MON: TCPIP Stack size: 3787 [2018-07-18 04:16:06] MON: APP Stack size: 3294 [2018-07-18 04:16:06] MON: LoRa Stack size: 3877 [2018-07-18 04:16:06] MON: heap usage: 278KB (279KB), free: 61KB [2018-07-18 04:16:06] SNTP: State change from 4 to 5 [2018-07-18 04:16:06] SNTP: State change from 5 to 6 [2018-07-18 04:16:06] SNTP: State change from 6 to 7 [2018-07-18 04:16:15] MON: SYS Stack size: 2851 [2018-07-18 04:16:16] MON: TCPIP Stack size: 3787 [2018-07-18 04:16:16] MON: APP Stack size: 3294 [2018-07-18 04:16:16] MON: LoRa Stack size: 3877 [2018-07-18 04:16:16] MON: heap usage: 278KB (279KB), free: 61KB [2018-07-18 04:16:17] MQTT: Sending status packet [2018-07-18 04:16:17] MQTT: Report config error: 0600000020 [2018-07-18 04:16:17] MQTT: Sending status succeeded: 6 ```
smbunn commented 6 years ago

Just had mine update itself to 1.0.5 and its running! Long may that last. I will still put in plastic screw spacers to hold the board so i can fix the angle and position in the socket.

smbunn commented 6 years ago

14 hours of continuous operation, a new record. image

jonathanve commented 6 years ago

I upgraded the version to 1.0.5, still the reboot loop, but this time different log messages

CNFG: Load online user config state change to 7

CNFG: Configuring LoRa module
LORA: Changing state from 2 to 3
LORA: Starting reconfiguration
LGMD:Invalid checksum: 0xD5, calcuLGMD:Invalid checksum: 0xD5, calculated: 0x55
LGMD:Timeout on cmnown dataLGMD:Receiving unknown data
LORA: Configuration failed, retry
LORA: Starting reconfiguration
LGMD:Invalid checksum: 0xD5, calcuLGMD:Invalid checksum: 0xD5, calculated: 0x55
LGMD:Timeout on cmnown dataLGMD:Receiving unknown data
LORA: Configuration failed, retry
LORA: Starting reconfiguration
LGMD:Invalid checksum: 0xD5, calcuLGMD:Invalid checksum: 0xD5, calculated: 0x55
LGMD:Timeout on cmnown dataLGMD:Receiving unknown data
LORA: Configuration failed, retry
LORA: RESET MODULE
LORA: Changing state from 3 to 0
LORA: Initialisation complete
LORA: Changing state from 0 to 1
MON: SYS Stack size: 2853
MON: TCPIP Stack size: 3793
MON: APP Stack size: 3294
MON: LoRa Stack size: 3887
MON: heap usage: 199KB (265KB), free: 139KB
LORA: Wait init complete, waiting for application.
smbunn commented 6 years ago

Did you try reseating the LoRa board? Its well worth doing. I used needle nose pliers to squeeze closed the plastic tabs on the top of the spacer posts, got the board completely out, removed the black foam and then pushed it hard into the socket before putting it back on the spacers.

smbunn commented 6 years ago

My gateway has run continuously for the last 5 days. I think they fixed it!

etychon commented 6 years ago

I have tried reseating the LoRa board many times without success, however upgrading to 1.0.5 has fixed the rebooting problem for me. The gateway managed to connect for the first time ever, and I have one sensor registered and it is also working fine.

It's good to see some light at the end of the tunnel!

smbunn commented 6 years ago

image Seven days continuous operation. The only small gap is me relocating my gateway to use my outdoor aerial. Now it is fixed I don't have to keep it next to my workroom desk so I can push the pink button every hour or so.

jonathanve commented 6 years ago

Hi. I did try the new firmware version 1.0.5, however... it did not work. Also adjusted the radio chip socket. Connecting to the US West Server. Shows a cmd 0x31 error and a checksum one. So, this means this gateway is broken or it is another software error in the gateway or between the server and the gateway??

*   The Things Network   *
*      G A T E W A Y     *
**************************
Firmware name: AmazingAckermann, type: 0, version: 1.0.5, commit: fa89b993, timestamp: 1531815112
Bootloader revision: 1, commit: 7167873a, timestamp: 1496411298
Build time: Jul 17 2018 08:12:51
Reboot reason: 0x40
BOOT: (persisted info) 6F 72 72 65 01 03 30 CD 2E E4 CD F9 B4 28 C1 2C 

[...]

CNFG: Load online user config state change to 4
HTTP: Close active socket 0
HTTP: Starting connection
HTTPS: Connection Opened: Starting TLS Negotiation
HTTP: Wait for TLS Connect
HTTP: TLS Connection Opened: Starting Clear Text Communication
HTTP: Got 1295 bytes
HTTP: Connection Closed
HTTP: Close active socket 1
CONF: Parsing response token: HTTP/1.1 200 OK
CONF: ROUTER URL: mqtts://bridge.us-west.thethings.network:8883

CNFG: Load online user config state change to 5
FLASH: Lock Activation Data

CNFG: Gateway activation locked

[...]

MON: APP Stack size: 3294
MON: LoRa Stack size: 3865
MON: heap usage: 199KB (265KB), free: 139KB
LGMD:Timeout on cmd: 0x31
LGMD:Receiving unknown data
LORA: Configuration failed, retry
LORA: RESET MODULE
LORA: Changing state from 3 to 0
LORA: Initialisation complete
LORA: Changing state from 0 to 1
LORA: Wait init complete, waiting for application.
LORA: Changing state from 1 to 2
LORA: Changing state from 2 to 3
LORA: Starting reconfiguration
LGMD:Invalid checksum: 0xD5, calcuLGMD:Invalid checksum: 0xD5, calculated: 0x55
LGMD:Timeout on cmnown dataLGMD:Receiving unknown data
LORA: Configuration failed, retry
LORA: Starting reconfiguration
LGMD:Invalid checksum: 0xD5, calcuLGMD:Invalid checksum: 0xD5, calculated: 0x55
MON: SYS Stack size: 2831
MON: TCPIP Stack size: 3787
MON: APP Stack size: 3294
MON: LoRa Stack size: 3865
MON: heap usage: 216KB (265KB), free: 123KB
LGMD:Timeout on cmnown dataLGMD:Receiving unknown data
LORA: Configuration failed, retry
LORA: Starting reconfiguration
LGMD:Invalid checksum: 0xD5, calcuLGMD:Invalid checksum: 0xD5, calculated: 0x55
LGMD:Timeout on cmnown dataLGMD:Receiving unknown data
LORA: Configuration failed, retry
LORA: RESET MODULE
LORA: Changing state from 3 to 0

[...]

CNFG: Configuring LoRa module
LORA: Changing state from 2 to 3
LORA: Starting reconfiguration
LGMD:Command, cmd: 0x31, size: 0
LGMD:Timeout on cmd: 0x31
LORA: Configuration failed, retry
LORA: Starting reconfiguration
jonathanve commented 6 years ago

Hi, I was able to register the gateway using a MPLAB Programmer and loading the firmware-with-bootloader hex file. Now it shows that is connected but:

Received Messages 0 Transmitted Messages 0

Getting constantly these errors:

LGMD:Rejected packet (0x11)
[...]
MQTT: Report reboot error: 0110
[...]
MQTT: Sending status packet
MQTT: Sending status failed

I connected to the US West Router and I am in South America in this moment (915MHz)

Logs

MQTT: GOT IP: 13.66.213.36
Connecting to: 13.66.213.36
MQTT: Connection Opened: Starting TLS Negotiation
MQTT: Wait for SSL Connect
MQTT: TLS ready: Connect MQTT
RQMQTT: Connected

*************************
MAIN: Gateway bridging
*************************

LGMD:Rejected packet (0x11)

MON: SYS Stack size: 2870
MON: TCPIP Stack size: 3793
MON: APP Stack size: 3294
MON: LoRa Stack size: 3877
MON: heap usage: 277KB (278KB), free: 61KB
MON: SYS Stack size: 2870
MON: TCPIP Stack size: 3793
MON: APP Stack size: 3294
MON: LoRa Stack size: 3877
MON: heap usage: 277KB (278KB), free: 61KB
MQTT: Sending status packet
MQTT: Report config error: 0600000020
MQTT: Sending status succeeded: 12

Anyone knows how should I proceed from here given I still can't receive or send messages?

KrishnaIyer commented 6 years ago

We haven't had reports of LoRa triggered reboot loops using firmware v1.0.5 and hence this issue is being closed.

Please note:
If you still see this exact behavior (reboot loops) using firmware v1.0.5, please post it here and I'll reopen it if needed. For other issues, please either append your info to an existing issue (if applicable) or create a new one.

holtkamp commented 6 years ago

@KrishnaIyer for the less tech-savy users / people who do not have / want to spend hours on flashing firmware and restarting with fingers crossed: will there be a step-by-step write-up to get the product working?

Also got a gateway lying around, spend some hours on it, encountered this issue and decided to wait until a stable version is available.

Seems like that time has come with 1.0.5 😄

KrishnaIyer commented 6 years ago

@holtkamp: That's nice to hear. Basically, if you just connect your gateway to the internet and if you've enabled the option Automatically update gateway in your console, the newer version will be automatically downloaded.

mbarnig commented 6 years ago

I was in the same situation as @holtcamp. A few days ago I did a power-up of the gateway and a reconnection to Internet. After some time four LEDS were ligthening blue, the gateway showed up as connected in the The Things Network Console and the local gateway portal displayed firmware v1.0.5-fa89b993. I did some tests with the Things Node. Everything seems now to work as expected. Great !

beamzer commented 6 years ago

We haven't had reports of LoRa triggered reboot loops using firmware v1.0.5 and hence this issue is being closed.

Please note: If you still see this exact behavior (reboot loops) using firmware v1.0.5, please post it here and I'll reopen it if needed. For other issues, please either append your info to an existing issue (if applicable) or create a new one.

My gateway runs 1.0.5 but still reboots, so please re-open. It seems to start stable, 4 blue LEDS are on but after a while the reboots start :(

Attached is the logging from the serial connection to the gateway with the debug output and several reboots. Timeframe is around 24h.

TTN-gateway__20180928_14.52.50.log

KrishnaIyer commented 6 years ago

@beamzer: If you look at the logs, you can find this line after which there is a reboot: *** assert ../src/app_lora.c:842:w == size:LORA Uart write should be blocking. One other member has also reported this and is filed under #52 .

jonathanve commented 5 years ago

Hello, I was able to get another 915 MHz Gateway and it checked in successfully. It is finally working!!

chiflux commented 5 years ago

I'd still recommend looking at the LoRa board before throwing everything away as a bad job. I've just photographed the board that I had an intermittent problem with, and the position of the board is quite subtle between it working fine, and just rebooting consistently.

It appears that although the board support has both standoffs, and sticky pad, the positional alignment is not perfectly secure (at least not on my unit). As I mentioned earlier in the thread, just dropping the GW was enough to dislodge it, and cause the reboot loop in a previously working gateway.

reboot_loop Click to embiggen

Pushing the board in the connector fixed the reboot loop for me!

smbunn commented 5 years ago

Mine was working but occasionally goes into reboot loops. I know now that I need to let it cool down, wiggle the LoRa card about in the slot and restart, most times this works. Sometimes I have to go thru the whole pink button re-register route.