Open adriansmares opened 4 years ago
At this point, the NS is allowed to already schedule a new downlink, before the AS handled the
nack
:
I think that the problem arises here. NS should not schedule a new downlink if a previous confirmed downlink is not acked. It should retry. The downlink frame is lost, so it can retry with the same FCntDown.
For whatever reason, an ack is not received and as such the NS emits an nack to the AS:
That does not happen - NS does not send a NACK if there's no uplink from device. For class B and C, however, NS goes to next downlink if class-based timeout is exceeded https://github.com/rvolosatovs/lorawan-stack/blob/26dd9be2cd78f9b1cee4fcc4f85102d7b7f657fc/pkg/networkserver/mac/utils.go#L65-L91
I think that the problem arises here. NS should not schedule a new downlink if a previous confirmed downlink is not acked. It should retry. The downlink frame is lost, so it can retry with the same FCntDown.
@johanstokking that is not what current behavior is and I think we have discussed this already offline and came to the conclusion that NS just tries next one and we would add an option to application downlinks, which would make NS wait for the particular confirmed downlink to be acked(or nacked) before proceeding to next application downlink in the queue
@johanstokking that is not what current behavior is and I think we have discussed this already offline and came to the conclusion that NS just tries next one and we would add an option to application downlinks, which would make NS wait for the particular confirmed downlink to be acked(or nacked) before proceeding to next application downlink in the queue
Hmm, what is the benefit then of confirmed downlink, if there's no automatic retry mechanism?
The fact that the application is notified whether it's been delivered or not (via ack/nack)
..Hmm, what is the benefit then of confirmed downlink, if there's no automatic retry mechanism?
I've been seeing this issue I use confirmed downlinks to control a Vicki TRV radiator valves. When I look at the work needed at the app layer to ensure DL is received, I mined that it's too much effort. I'm a low-code scripter not a Dev and as more Actuators appear in the eco system the need for confirmed downlinks will grow. Just for ref even the LNS running inside a Multitech AEP GW re-queues a failed confirmed DL.
I also think that the default behavior should be to retry. It's what we did in V2 and it's what other LNSes are doing, although it's not strictly specified in the LoRaWAN spec.
Besides moving the retry logic to the application-layer, we may also get devices in an invalid state if some downlink messages make it and some not, or, if the application retried, in a different order than originally scheduled.
I also think the NS should retry confirmed downlinks and the end device ending up in an invalid state was exactly my point when we discussed this earlier @johanstokking. Glad that we're finally on the same page.
Summary
The current behavior of the AS may lead to lost queues once a nack arrives, due to the way the downlink queue is recalculated.
Steps to Reproduce
Consider the following situation: 3 confirmed downlinks are sitting in the NS downlink queue, and none of them are scheduled yet:
The NS attempts to schedule the downlink with
FCnt
n
:For whatever reason, an
ack
is not received and as such the NS emits annack
to the AS:At this point, the NS is allowed to already schedule a new downlink, before the AS handled the
nack
:The AS will now handle the
nack
and attempt to recalculate the queue:But since the green downlink has already been scheduled, the recalculate queue will be rejected since the
FCnt
of the first element is too low. As such, we lose the blue and yellow downlinks.What do you see now?
The recalculated queue has an
FCnt
that is too low.What do you want to see instead?
The recalculated queue should have a higher
FCnt
than the currently scheduled element, but how should we go about this ?Environment
v3.10
How do you propose to implement this?
Not sure at the moment.
How do you propose to test this?
Add some delays in the nack validation and check that even after the second downlink was scheduled, the recalculation is successful.
Can you do this yourself and submit a Pull Request?
Yes, but I think we should clear up the behavior first. cc @johanstokking