GrumpyOldPizza / ArduinoCore-stm32l0

Arduino Core for STM32L0
125 stars 67 forks source link

OTAA join not working after session save #40

Closed romansoft closed 6 years ago

romansoft commented 6 years ago

Not sure if this is intentional behavior or not, but, after I perform setSaveSession(true) and joint the network the first time, subsequent joinOTAA's have no effect, ie. no MAC commands are sent to a network server and no onJoin event is triggered. It looks that uplinks continue with the previous frame count value. So, in effect, joinOTAA acts like rejoinOTAA which is unexpected and undesirable.

The reason I'd like joinOTAA to start a fresh join process with the server is allow for recovery from potential state corruption in either a node or in a network server without waiting very long time for the MAC stack to determine no gateways are connected. Additionally, on the system reboot, I perform join and wait for onJoin event before proceeding further. If it never happens, it causes some issues.

GrumpyOldPizza commented 6 years ago

It is intended behavoir.

rejoinOTAA() is intended as forced rejoin after a connection loss (by ADR detection or confirmed message loss detection).

So a better way would be to junk "setSaveSession()" and simply set "joined()" to true directly after LoRaWAN.begin() if the last session is still active (OTAA or ABP), i.e. before you force a joinOTAA()/rejoinOTAA() ?

The whole purpose of the EEPROM there is to allow a continue of a OTAA session across reset boundaries.

One thing you have to consider is that with "devNonce" being only 16 bit, and each "devNonce" only being usable once, you can only do a joinOTAA() 65536 times per appEUI ...

romansoft commented 6 years ago

OK, rejoinOTAA works as I expect, it allows to reestablish the last session. This is very useful in some scenarios, for example, after replacing a battery. However, there are situations, where the shared session between node and NS may no longer be valid. For example, state corruption on either side (like frame counter reset due to a glitch) or a node being moved under service of another gateway with a different subband (most public network operators require a node to detect link loss and reacquire the new channel plan) or when DevEUI/AppEUI/AppKey were modified to join a different network operator. These devices may be placed untouched in inaccessible locations over very long period of time while the network around them evolves, so they need to be robust. In those situations, the only logical option is to force the join process with the NS to reestablish a new session. Otherwise, the node may become stuck in unconnected state and not able to recover. My expectation was that joinOTAA() does just that - triggers the node to start a new session, while rejoinOTAA() just restores the current session. (If joinOTAA() behaves like rejoinOTAA(), why even have rejoinOTAA() function?) So, how do I force a new session, ie. clear current session state settings in eeprom while keeping initial credentials (AppKey, AppEUI)? There are also situations when the node needs to be restored to the factory state, say after load testing or servicing. How to do a "factory reset", ie. clear the whole eeprom state - session and credentials?

GrumpyOldPizza commented 6 years ago

You can clear the EEPROM via setSaveSession(false).

I see your point, but I am stuck how to do this properly in an API (and actually I was worried about the exact problem you are running into).

The key thing is that you want to be able to code a device, which just continues the last OTAA/ABP session post reset. So ideally your code just works like it is. If you detect a disconnect, you retry via rejoinOTAA(). Perhaps the right thing to do is to add a "restore = true" flag to joinOTAA(), so that you have to have setSaveSession(true) to enable session save/restore, and for joinOTAA()/joinABP(), you need to pass in "restore == true" (default) to restore it, or "restore == false" (non-default) to force a new connect, and clearing out of the EEPROM.

most public network operators require a node to detect link loss and reacquire the new channel plan

That is what ::rejoinOTAA() is for. Rejoin forced a new join-request with the same session setup, it does not restore the last session.

DevEUI/AppEUI/AppKey were modified

In that case EEPROM is cleared as well.

romansoft commented 6 years ago

Thanks for the clarifications. I had a wrong understanding of rejoinOTAA() behavior and purpose. I thought it restored the last session, same as joinOTAA does. Anyway, I understand your dilemma about the API definition. My thinking is that there are only a few well defined scenarios when you want to restore the existing session (eg. power loss), but many more not well defined scenarios when you want to trigger a new join (eg. unforeseen errors, user pushing a reset, etc.). So, I would make new join trigger as default behavior and the session restore something the application has to request specifically. Issue recovery is more critical to me than power optimization. But, it's my two cents.

I tried setSaveSession(false) yesterday, but it did not seem to clear the session settings. Uplinks continued as before, after joinOTAA() timed out. Is there a long latency to when eeprom clearing takes effect after this command or should eeprom clear immediately? I'll try again later.

GrumpyOldPizza commented 6 years ago

I do beg to differ there. Session save/restore is essential, and the continue of an established session should be the default. A gateway, if correctly implemented, will refused a join attempt if "devNonce" has been seen before. So it is utterly cruical to avoid any OTAA join attempts that are not needed. Thus the API needs to support that in the simplest possible way with the fewest required code changes to take advantage.

The EEPROM writes are deferred, but the in SRAM copy is current all the time ...

romansoft commented 6 years ago

True, network carriers hate frequent rejoins as it reduces gateway's capacity. But, I don't see rejoin events frequent on production devices (perhaps few times a year), and definitely much less than 65536 times over the lifetime of a node (unless it's a defective node or botched firmware). For development devices, it's a different matter. I guess the default behavior you propose is OK to protect uninformed developers from stressing the network. As long as you provide the restore flag, the advanced developer can choose the appropriate behavior for their use case.

The EEPROM writes are deferred, but the in SRAM copy is current all the time ...

This explains what I saw. I was doing device reset 200ms after setSaveSession(false). SRAM would be cleared, but EEPROM might've not. When does EEPROM write take place?

GrumpyOldPizza commented 6 years ago

EEPROM takes 6.4ms for a 32bit write. So to write back the full 256 bytes (or whatever it is) might take a while.

GrumpyOldPizza commented 6 years ago

Ok, my head is spinning. You want an override for not restoring the previous session.

That is trivial, just don't use setSaveSession(true) before you do do a joinOTAA().

I agree that the API could be modified to use a different more intuitive scheme. But the functionality is there as is.

romansoft commented 6 years ago

@GrumpyOldPizza, not so trivial. Please consider the scenario when Murata module is used as modem and the application firmware is on another MCU. For security reasons, it is desired to keep the lora credentials out of application code, programmed only once into the Murata module. Correct me if I'm wrong, but since there there is only one method of saving the state, which saves both initial credentials and session state, it would not be possible to achieve this objective.

I tested setSaveSession(false) again, waiting more than sufficient time (2 sec) for the EEPROM write to complete, but it still does not seem to work. Issuing an uplink command afterwards, performs uplink transmission and returns without any error. I'd expect that if the session credentials are wiped out, communication should stop working.

GrumpyOldPizza commented 6 years ago

The credentials are not part of the session save process. You call ::setAppEui() and it gets stored in EEPROM. Then you can call ::joinOTAA() without arguments and it uses the commissioning from EEPROM. The set in setSaveSession() is more along the line of session keys, and ADR settings and channels.

On Fri, Sep 28, 2018 at 2:12 PM romansoft notifications@github.com wrote:

@GrumpyOldPizza https://github.com/GrumpyOldPizza, not so trivial. Please consider the scenario when Murata module is used as modem and the application firmware is on another MCU. For security reasons, it is desired to keep the lora credentials out of application code, programmed only once into the Murata module. Correct me if I'm wrong, but since there there is only one method of saving the state, which saves both initial credentials and session state, it would not be possible to achieve this objective.

I tested setSaveSession(false) again, waiting more than sufficient time (2 sec) for the EEPROM write to complete, but it still does not seem to work. Issuing an uplink command afterwards, performs uplink transmission and returns without any error. I'd expect that if the session credentials are wiped out, communication should stop working.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GrumpyOldPizza/ArduinoCore-stm32l0/issues/40#issuecomment-425552651, or mute the thread https://github.com/notifications/unsubscribe-auth/AG4QfD6OsldWg15ZrZyfhJ2gtpRYvwANks5ufoKhgaJpZM4W8EH2 .

GrumpyOldPizza commented 6 years ago

Still toying with that. I think I'll change the scheme there, so that a session is always saved, and restored post reset (at ::begin() time). Hence ::joined() will be true, and you simply can continue your session.

If you want to join another session (gateway), you can simply call joinOTAA() / joinABP().

Not all is clear to me, but that would mean that some parameters would apply only to the next join (like channel mask/frequencies), or rejoin. Thus they would not affect the current session. So the parameters need to be split (into a static and a dynamic set) and saved/restored differently.

romansoft commented 6 years ago

Currently, something doesn't feel right with joinOTAA() usage. Here is how I would like to join the network on startup:

  1. Set up channel plan and other global MAC settings
  2. Join the network - joinOTAA(). If the previous session should be restored (eg. power supply interruption), joinOTAA() should just restore the last session. But, if a new NS join should be made, joinOTAA() should force a new session.
  3. Wait for "joined" status (preferably using onJoin() callback event for lower power, but, I guess, polling joined() could be acceptable)
  4. Repeat 2 & 3 until joined or timed out. (I'd rather do retries under application control then using setRetries(n) with n>0 since I'd like to reset Watchdog on each join iteration.)

When using onJoin() method, the above procedure works fine on initial join, but wait indefinitely at 3 if there was a saved session. It would be best to have simple startup procedure, regardless of the previous saved state and reset cause.

I like your proposal to automatically restore the last session. In cases where session restore is appropriate, my application code would simply check joined() flag and skip the above startup procedure. In case a new session should be forced, the code would follow the startup join procedure instead.

I think you would need to modify begin() method to restore the last session and to set joined flag to true. In joinOTAA() method, you would add the session save into EEPROM upon successful join.

Not sure about the need to separate static and dynamic session parameters. In the above scenario, the static parameters would be restored in step 1 by the application. But, perhaps there are other joining procedures where this would be useful.