ARMmbed / mbed-os

Arm Mbed OS is a platform operating system designed for the internet of things
https://mbed.com
Other
4.67k stars 2.98k forks source link

LoRaWAN: Join procedure hangs #10590

Closed teijokinnunen closed 3 years ago

teijokinnunen commented 5 years ago

Description

Target board: DISCO_L072CZ_LRWAN1 MBED version: 5.12.3 (SHA 0f959db) Toolchain: GCC ARM (gcc version 7.3.1); using Eclipse IDE, built using "develop" config

We have seen this issue only once so far, so it's rather rare.

OTAA join procedure is triggered with connect():

17:54:45: MBE: [DBG ][LSTK]: Initializing MAC layer
17:54:45: MBE: [DBG ][LSTK]: Initiating OTAA
17:54:45: MBE: [DBG ][LSTK]: Sending Join Request ...
17:54:45: MBE: [DBG ][LMAC]: Frame prepared to send at port 0
17:54:45: MBE: [DBG ][LMAC]: TX: Channel=2, TX DR=5, RX1 DR=5
17:54:45: MBE: [DBG ][LSTK]: Transmission completed
17:54:50: MBE: [DBG ][LMAC]: RX1 slot open, Freq = 868500000
17:54:51: MBE: [DBG ][LMAC]: RX2 slot open, Freq = 869525000
17:54:52: MBE: [DBG ][LMAC]: Frame prepared to send at port 0
17:54:52: MBE: [DBG ][LMAC]: TX: Channel=1, TX DR=4, RX1 DR=4
17:54:52: MBE: [DBG ][LSTK]: Transmission completed
17:54:57: MBE: [DBG ][LMAC]: RX1 slot open, Freq = 868300000
17:54:58: MBE: [DBG ][LMAC]: RX2 slot open, Freq = 869525000
17:54:58: MBE: [DBG ][LMAC]: Frame prepared to send at port 0
17:54:58: MBE: [DBG ][LMAC]: DC enforced: Transmitting in 5627 ms
17:55:04: MBE: [DBG ][LMAC]: TX: Channel=1, TX DR=3, RX1 DR=3
17:55:04: MBE: [DBG ][LSTK]: Transmission completed
17:55:09: MBE: [DBG ][LMAC]: RX1 slot open, Freq = 868300000
17:55:10: MBE: [DBG ][LMAC]: RX2 slot open, Freq = 869525000
17:55:11: MBE: [DBG ][LMAC]: Frame prepared to send at port 0
17:55:11: MBE: [DBG ][LMAC]: DC enforced: Transmitting in 14618 ms
17:55:25: MBE: [DBG ][LMAC]: TX: Channel=0, TX DR=2, RX1 DR=2
17:55:26: MBE: [DBG ][LSTK]: Transmission completed
17:55:31: MBE: [DBG ][LMAC]: RX1 slot open, Freq = 868100000
17:55:32: MBE: [DBG ][LMAC]: RX2 slot open, Freq = 869525000

The problem is that the join procedure stops here (the device was on for another 30 mins and nothing happened hereafter). The LoRaWAN stack stops trying at DR 2. We would expect the join to either continue and succeed or the application to get JOIN_FAILURE indication, neither of which happened.

Issue request type

[ ] Question
[ ] Enhancement
[ X ] Bug
0xc0170 commented 5 years ago

cc @ARMmbed/mbed-os-wan

mattbrown015 commented 4 years ago

Hi @teijokinnunen,

Did you ever make any progress with this?

Have you seen it again?

Given that it's been 8 months since the last reply it appears it hasn't come to the attention of anyone in the ARM mbed team.

I've seen the join sequence hang on multiple occasions and believe this to be a serious and real problem.

I opened a forum topic but haven't had any replies. [LoRaWAN] Join Sequence Breaks if Non-Join Accept Message Received and [LoRaWAN] Join Sequence Stalls Occasionally.

Regards, Matt

teijokinnunen commented 4 years ago

Hi @mattbrown015,

As I didn't get any response, there was no progress unfortunately in fixing the root cause. The problem is so rare that it's difficult to investigate it. Our "solution" in the application was to use a timer to stop and restart the join procedure if it takes too long (and is potentially hung).

BR,

mattbrown015 commented 4 years ago

Hi @teijokinnunen,

Are you willing to share any details of your environment, for example how many end-devices you've got, for the purposes of discussion and perhaps getting someone's attention?

We had 200 end-devices running and saw 26 of them fail to join. That's enough failures to be a problem but not enough to be easily investigated. :-(

I've only ever seen the join sequence hang after it has been running for sometime. For example, when the gateway has been off for hours.

I've been thinking about adding a join sequence watchdog but haven't given it serious thought yet. I think I will now.

Have you got your mbed version up-to-date or are you still using 5.12.3? One of my problems is that I'm using an mbed version from Jan 2019.

Thanks, Matt

teijokinnunen commented 4 years ago

Hi @mattbrown015,

At the time the problem was seen, we were in very early R&D phase so didn't have many devices then. I haven't been in the project for a while, but I haven't heard about any join hanging issues, apparently the join watchdog timer workaround has been working well in practice.

We're using 5.13 branch of MBED, but it doesn't make really difference. If you look at the LoRaWAN MBED integration commit history, it's clear that it has been practically abandoned for over a year now. I wouldn't use it any more in a new project, which is a pity really, as it was very easy to start using it with MBED. Taking a maintained LoRaWAN stack (LoRaMac-node...) in use would require considerable integration effort.

BR,

mattbrown015 commented 4 years ago

If you look at the LoRaWAN MBED integration commit history, it's clear that it has been practically abandoned for over a year now. I wouldn't use it any more in a new project, which is a pity really, as it was very easy to start using it with MBED.

Yes, I agree. Shame! :-(

ciarmcom commented 4 years ago

Thank you for raising this detailed GitHub issue. I am now notifying our internal issue triagers. Internal Jira reference: https://jira.arm.com/browse/IOTOSM-2323

ciarmcom commented 3 years ago

We closed this issue because it has been inactive for quite some time and we believe it to be low priority. If you think that the priority should be higher, then please reopen with your justification for increasing the priority.