TheThingsNetwork / lorawan-stack

The Things Stack, an Open Source LoRaWAN Network Server
https://www.thethingsindustries.com/stack/
Apache License 2.0
950 stars 300 forks source link

ABP frame counter reset behaviour #4502

Closed jpmeijers closed 2 years ago

jpmeijers commented 3 years ago

Summary

It's a little confusing how ABP frame counter resets behave when "Frame Counter Resets" is enabled. It seems like a device can go from fcnt 0, to 1, back to 0. But a device can not go from fcnt 0 directly back to fcnt 0, as the packet is then seen as a duplicate.

It would be nice to allow duplicate fcnt values if the fcnt==0, and "Resets Frame Counts" is enabled.

Why do we need this?

Make fast testing with ABP devices easier. Currently I need to wait for 2 transmissions before I can reload firmware. With this change I can immediately reset or reload firmware and do another test transmission.

What is already there? What do you see now?

What is missing? What do you want to see?

Environment

The Things Stack 3.14.0

How do you propose to implement this?

Not sure, but definitely related to https://github.com/TheThingsNetwork/lorawan-stack/issues/2434 and https://github.com/TheThingsNetwork/lorawan-stack/issues/2446

How do you propose to test this?

Register an ABP device, power it up, wait for first transmission (fcnt=0), reboot it, wait for another transmission (fcnt=0). Is the second transmission with fcnt=0 dropped or delivered?

Can you do this yourself and submit a Pull Request?

No

johanstokking commented 3 years ago

Hmm, I understand this can be useful for quick testing. It makes message handling less LoRaWAN compliant than it already is though, because duplicate detection and retransmissions would be skipped for FCnt=0. We'll discuss internally and triage here. Thanks for reporting.

johanstokking commented 3 years ago

@jpmeijers we just discussed this. This is still a sliding edge. The next request may be that FCnt 1 gets the same behavior because some firmwares start with 1.

Also we already have behavior in place for handling frames with the same FCnt (deduplication, cooldown window, gathering additional RX metadata for downlink paths, handling retransmissions, etc). Our current position is that we don't want to complicate this further by non-exhaustive edge cases that are just for testing.

So we park this until there are more upvotes.

jpmeijers commented 3 years ago

Also we already have behavior in place for handling frames with the same FCnt (deduplication, cooldown window, gathering additional RX metadata for downlink paths, handling retransmissions, etc).

Interesting. I was about to ask about that in a different issue. Use case: balloon flights. Some frames are delayed and seen as duplicates, but still contains very valuable metadata.

johanstokking commented 3 years ago

Also we already have behavior in place for handling frames with the same FCnt (deduplication, cooldown window, gathering additional RX metadata for downlink paths, handling retransmissions, etc).

Interesting. I was about to ask about that in a different issue. Use case: balloon flights. Some frames are delayed and seen as duplicates, but still contains very valuable metadata.

The way it is implemented today is that the Network Server uses a deduplication window (default 200 ms), in which it waits for duplicates to arrive from various sources (different gateways and peering). After the deduplication window closes, there's a cooldown window of 1 second, in which all duplicate frames are being discarded. During that time, the NS can gracefully process the deduplicated uplink message. When the cooldown window closes, additional duplicate uplink frames are considered retransmissions. So yeah, The Things Stack by default isn't very balloon friendly. But LoRaWAN isn't really designed for these kind of use cases...

Elfe commented 2 years ago

So an ABP device always needs to transmit 2 messages with increasing frame counters before it can reset the frame counter?

On the v2 network I have/had ABP devices that have no state and send a message when a certain condition is met. Now I would need to send multiple messages or switch to OTA activation (which results in even more messages)?

The devices transmit a timestamp (from GPS) which is checked by the application layer so I do not need any frame counter checks or device state. The devices are small Arduinos that get powered when the car is running.

johanstokking commented 2 years ago

So an ABP device always needs to transmit 2 messages with increasing frame counters before it can reset the frame counter? On the v2 network I have/had ABP devices that have no state [...]

Yes. That ABP device is not LoRaWAN compliant, so that's all we can do.

How else should we differentiate retransmissions from frame counter resets? An arbitrarily long time in between them? That will make it feel random for most users.

The devices transmit a timestamp (from GPS) which is checked by the application layer so I do not need any frame counter checks or device state.

You may not need it but LoRaWAN does.

In this case, can you derive the FCnt from the timestamp, so that it is at least incrementing? I'm quite certain you don't need ADR as it seems like you're using this for mobile devices, so FCnt gaps isn't a big issue.

matthijskooijman commented 2 years ago

I also ran into this yesterday, being confused for a while why no packets came in after a reset, until I realized what was going on (and then someone pointed me to this issue).

I can see that implementing the proposed change (processing retransmissions for FCnt=0) is not ideal indeed.

However, to reduce user surprise, I wonder if it would be possible for dropped retransmissions to be displayed in the console? If the console (live data for the device) would have told me it received and dropped a message, I would have understood what was happening directly and saved some time.

This would not solve the original issue of quick testing, nor I guess the "balloon" usecase (though I do not understand that case), but at least make it more obvious to users what is happening.

johanstokking commented 2 years ago

However, to reduce user surprise, I wonder if it would be possible for dropped retransmissions to be displayed in the console? If the console (live data for the device) would have told me it received and dropped a message, I would have understood what was happening directly and saved some time.

I think we should be able to publish events for dropped retransmissions (and frames received after the deduplication window closes), providing that we can do the device matching first.

That would also be helpful with understanding which gateways or peering routes were too slow to be considered in deduplication.

@adriansmares what do you think?

adriansmares commented 2 years ago

I think we should be able to publish events for dropped retransmissions (and frames received after the deduplication window closes), providing that we can do the device matching first.

The frames received during the cooldown period will be silently dropped before they are even matched for performance reasons. In general caching the matching result in a way that is consistent is tricky - we want to avoid writes as much as possible, and 'waiting' for a matching result is prone to race conditions.

Retransmissions arriving outside of the cooldown period will be matched, and there will be an ns.up.data.receive. But there is a catch here - there is a limit on how many times the same retransmission will be matched: the same frame will not match if it is retransmitted more than NbTrans times (with an exception for confirmed uplinks before LoRaWAN 1.0.4 which is always a static '5' due to lack of spec clarity).

I believe this may be what other users that reuse the frame counters will see - some ns.up.data.receive but no as.up.data.receive (since these are viewed as retransmissions), and then radio silence, as the new 'retransmissions' are not supposed to be there as they exceed NbTrans.

In general for security reasons, re-using the frame counters outside of a small number of retransmissions is dangerous: attackers can replay the frames in order to force the Network Server to send a downlink and thus affect the duty cycle of the downlink path.

johanstokking commented 2 years ago

I think we all understand the downsides of this. Indeed it would be about an event of rejected retransmissions, so after matching. I think it's very much acceptable not to see messages discarded during the cooldown window.

descartes commented 2 years ago

Of the many complications of storing state, for ABP it is pretty much just holding the f_cnt and almost all devices have some EEPROM that is not cleared by a firmware update that the f_cnt could be kept in and restored from and updated - not ideal on the EEPROM if it's actually flash - but then this is for testing ...

@matthijskooijman, High Altitude Balloons can be heard by gateways many 100km's away which potentially could cause issue with latency. But you could just as easily use OTAA. Or, preferably (speaking as a HAB for schools person), not use LoRaWAN at all, the HAB community has plenty of people with receivers.