arduino / mkrwan1300-fw

22 stars 23 forks source link

ABP Join not functional when device is registered as OTAA #42

Open Bobblybook opened 2 years ago

Bobblybook commented 2 years ago

Hi, I've spent the last few days trying to get to the bottom of this. I think I've narrowed it down to a combination of ABP join with OTAA registration.

I am using this updated firmware by @flhofer as if fixes issues in 1.2.3 that prevent ABP working: https://github.com/flhofer/mkrwan1300-fw/tree/fix-hup1.2.3 With this library: https://github.com/flhofer/MKRWAN/tree/fix-hup1.2.3

I an in Australia (AU915) using TTN, sub-band 2. Registering the device with OTAA activation on TTN and then performing an OTAA join works fine. After a number (3-5) failed packets, the network negotiates correct parameters and then I have about 100% success rate.

Registering the device using ABP activation on TTN and then performing an ABP join works fine. Again, it takes a couple of failed packets, then parameters are corrected and there is almost a 100% success rate on packet sending.

The issue is when registering the device with OTAA activation on TTN, saving the devAddr, NwkSKey and AppSKey that were obtained in the OTAA join, and then trying to connect via ABP join using these keys. It fails to send packets reliably. ABP join appears to work on the modem, and some packets do make it through, but on average it's less than probably 30%. Also when packets do make it through, the corresponding downlink never works.

I am testing with the example LoraSendAndReceiveTest sketch to eliminate my own code from the problem. I changed it to send unconfirmed uplinks (modem.endPacket(false);)

Again just to be clear, the downlinks and packet reliability is perfect when using OTAA registration + OTAA join, or ABP registration + ABP join. However, I need to save session data between power cycles and don't want to perform a new OTAA join for every uplink, therefore I need to be able to use ABP join with the ABP credentials returned by the initial OTAA join.

I can see detailed packet information in my gateway, and all the failed packets are never actually being sent or arriving at the gateway to begin with, so I'm assuming the issue is with the modem commands.

I am wondering if it's a case of some modem AT command discrepancy (either an additional OK command sitting around not dealt with, or instead one missing that should be arriving but isn't). I'm digging around in the firmware trying to figure it out, but I think I'm a little over my head and scared to mess up the firmware.

Does anyone know how to get this working? It's confusing to me that every 3-10 packets sent, one actually makes it through. I would have thought if there was a problem with the AT commands, then none would succeed.

Edit: Interesting that using a fixed dataRate of 6 (SF8, 500kHz bandwidth) appears to make it to the gateway every time. I ran a loop, cycling through DR values 0 to 6: https://i.imgur.com/Gn8DMUH.png

From the top packet down, these are supposed to be DR=0, DR=1, DR=2.. to DR=6, then DR=0 again. Many of the other DR values never made it to the gateway, but DR6 makes it through every time.

Maybe the 500kHz bandwidth is somehow getting around whatever issue I'm experiencing? All of these DR6 uplinks make it from the gateway to TTN as well, though downlinks still do not function.

flhofer commented 2 years ago

@Bobblybook Hi! Just for clarification of your problem: When you say save, do you mean you read the key values after OTAA and store them somewhere, reboot, retrieve the stored values and write them again for ABP? Also, the 10 packets are sent in a row, you don't mean 10 reboot tries, right?

As a starting note, a thing that I noticed in my tests is that LoRaWan is quite strict on frame counters. Even if you use a relaxed configuration on the server, the devices are not so tolerant. E.g. if the counter does not match, the partner does not respond. This is to say, you have to save FCU and FCD as well when restarting and rejoining, or there will be the discrepancy you observed.

Can you give me this feedback then we see, ok?

Bobblybook commented 2 years ago

Yes by saving I mean saving to non-volatile memory, robooting/power cycling, and loading the values back in. I have tried saving and carrying over every variable I can: FCU, FCD, DR, txPower, Rx1Delay, Rx2Delay, and a number of others. I have verified that the values are being read correctly and set properly after load. Two things I noticed:

I was doing a lot of testing a few weeks ago and it's possibly related to this: #43.

I can have a perfectly working link, and then saving/restoring parameters appears to break the consistency. After a number of failed packets, it stabilises again.. but it always takes a while to get back to normal. It almost feels like I'm not carrying over some additional data from the session and it's having to renegotiate something again (maybe ADR Ack?). But I've copied as much as I can, even modifying the library to take as many variables from the modem commands as are available to me.

Thanks

flhofer commented 2 years ago

Ok, first off: Rx1 and Rx2 during Join refer to Join delays that default to 5 and 6s. So that's normal. ABP does not require a join process and thus avoids this change. Just keep the Rx windows to default. ADR should not influence at all.

Don't know about the AU gateway configuration, but for EU only the basic 125kHz channels support multi-sf transmissions. The high-throughput channel is fixed in DR and can not receive on other SF than the configured one. This is to say that it could be that you operate on a channel where the gateway does not support muli-SF, and if you set manually an unsupported DR, it may slowly ADR to the default data rate. (Verify this with your gateway specifications)

Another is the Channel setup. During OTAA the server sends a channel mask and maybe a channel frequency setup to the device. Depending on which channel range you operate, it seems that some do not support DR 0-7, some not 8-12. So you might want to store this. Also, the specification says 125kHz channels have to join at DR2, 500kHz channels at DR8. For the EU band that does not apply to ABP, but maybe for AU it does.

Hope this helps..

Bobblybook commented 2 years ago

Thank you.

I'm a little confused when you say that ABP does not require a join process. I know in technical terms it does not, but the MKR1300 requires me to still call a joinABP() before I'm able to send data using existing ABP connection parameters (either from a previous ABP registration, or an OTAA join). And when doing this, it obtains RX1 and RX2 delays of 1 and 2 seconds respectively, which I have to manually correct back to the delays set in TTN (5 & 6s).

I understand about the channel masking and I have set the correct mask for Australia (915 SB 2, channels 7-15). Also the channel mask is not negotiated automatically on the newest version of TTNv3 - this may be an Australia-only issue. Regardless, I am able to set the channel mask manually without issue (see #18).

However, the device works perfectly well once it's all up and running, with almost 100% success rate on packets. The issue appears to be upon power cycle, it takes a few transmissions before things are stable again, and I'm unsure why.

flhofer commented 2 years ago

@Bobblybook I never paid attention, but I know that the modem sends off some MAC command at ABP join. I've executed some tests where a bunch of MKR's ABP-join constantly and a nearby node ends up not being able to send a confirmed message anymore. I say MAC-level, as server logs do not show any activity on the application level.

I would say you need the join to set the keys for encryption. If your mask is correct and they send on the right frequencies and with the correct data rate there are only two things I can think about right now.

  1. somehow the downlink counter is messed up (FCD is 1 lower or so) and your device ignores every confirmed response from the server. Or even the other way around, your device sends FCU with one lower and the server ignores your message. That's something that happened to me. While the latter can be avoided with severs allowing frame counter relaxation, the former is strictly enforced. Once the server sends a downlink for MAC-related purposes, the counters then end up up-to-date.
  2. The DR you use at startup is too high for the area you are in. The device is set to ADR and after a bit settles to a good DR. during OTAA you may not notice this for different reasons. When power-cycling the device resets to max DR and you have to wait for it to get back down.

Thought of reading out DR and printing it to serial regularly? What about FCU and FCD. Be aware that internally these might mean the next used/expected or the last used/received, which might mess up the save retrieve. Had some similar issues where my device did not receive the downlink the server sent. The restored counter ended up being lower than the actually used downlink counter.

Bobblybook commented 2 years ago

Thanks for your thoughts.

I know it's not point 2 as I'm running a local gateway in my house. The device readily negotiates down to the highest DR possible, when it's allowed to. So any of the DR's from best to worst should work fine. Just as a precaution, I am testing the "worst"/safest DR just in case.

Your point about the counters being 1 off and out of sync is an interesting point. I'll follow up on this and do some more testing to see if this is indeed the situation!