Closed adriansmares closed 3 years ago
I am in favor of this, as we have already discussed. Application Server should always trust Network Server, because it always has the most up-to-date data about the device session.
- End Device sends FPort=0 uplink - AS won't receive this uplink
Shouldn't we change this so that NS does send this, but with empty payload and FPort 0, so that it won't be sent upstream?
- End Device sends FPort=0 uplink - AS won't receive this uplink
Shouldn't we change this so that NS does send this, but with empty payload and FPort 0, so that it won't be sent upstream?
It's redundant, since NS->AS messaging is async, queue invalidation can arrive before the FPort==0
uplink is sent to AS to confirm the session, so we have to do this anyway. The only reason to send an uplink to AS in response to FPort==0
uplink to NS, would be to ensure AS is notified of session change as soon as possible, but we don't have such need. Even then it would make way more sense to introduce a SessionSwitch
message, which NS would send to AS instead of the FPort 0 uplink.
I have upgraded the priority to prio/high
as it is affecting v3.11.1
deployments.
The changes should be the following:
SessionKeyID
to DownlinkQueueInvalidation
. If the queue is empty, the AS cannot know at this moment which queue was invalidated.The following proto
addition (field 3
) should suffice.
message ApplicationInvalidatedDownlinks {
repeated ApplicationDownlink downlinks = 1;
uint32 last_f_cnt_down = 2;
bytes session_key_id = 3 [(gogoproto.customname) = "SessionKeyID", (validate.rules).bytes.max_len = 2048];
}
dev.Session
.Instead of using the dev.Session
always, do a switch on SessionKeyID
in order to establish which session to use. If required, update the current dev.Session
.
nacked
message is from - it is possible that the nacked
message is from a pending session (from the AS perspective) and as such the FCnt update should be done on the correct session.As before, do a switch on the SessionKeyID
in order to determine which session to use. If required, update the current dev.Session
.
DownlinkQueue{Push|Replace}
with the minimum FCnt
and always update the LastAFCntDown
to this value. This would ensure that the system converges if at any point we're for some reason out of sync between AS and NS.The following proto
addition should be filled and be added as error details to the errFCntTooLow
in the NS:
message UpdateDownlinkQueueErrorDetails {
bytes session_key_id = 1 [(gogoproto.customname) = "SessionKeyID", (validate.rules).bytes.max_len = 2048];
uint32 last_f_cnt_down = 2;
}
The AS can then take these details and update the current session.
The changes are backwards compatible and hopefully minimal on the NS side. The genie is already out of the bottle - the whole protocol between AS and NS slowly became asynchronous, and simply reverting the FPort=0
change won't be enough. I don't think that this transformation was wrong at the end of the day, but we must fix these quirks regarding session bisimulation.
cc @johanstokking, @rvolosatovs
Sounds good to me!
Anything I can do here? If so, please re-assign me and let me know what.
Anything I can do here? If so, please re-assign me and let me know what.
I'm already working on this, with a huge emphasis on the following point:
* (Optional) return error details on `DownlinkQueue{Push|Replace}` with the minimum `FCnt` and always update the `LastAFCntDown` to this value. This would ensure that the system converges if at any point we're for some reason out of sync between AS and NS.
The reason for this is that taking actions based on the events received from the NS, as part of the queue, is fundamentally not really enough:
Given these characteristics, there are two options:
dev.Session.SessionKeyID
, dev.PendingSession.SessionKeyID
, and the LastAFCntDown
, as part of the error details. We then use the error details to rebuild the session in the AS. Fundamentally this means that when we try to do a downlink queue operation, using outdated data (perhaps an outdated session, perhaps a FCnt too low), we eventually converge to the NS state. It may take one, two, three tries, I'll make it bounded in order to avoid infinitely spinning, but we're at least operating with information that's significantly newer than the one from the uplink messages queue.
- Just trust the NS. What I mean by this, is that push/replace operations return the
dev.Session.SessionKeyID
,dev.PendingSession.SessionKeyID
, and theLastAFCntDown
, as part of the error details. We then use the error details to rebuild the session in the AS. Fundamentally this means that when we try to do a downlink queue operation, using outdated data (perhaps an outdated session, perhaps a FCnt too low), we eventually converge to the NS state. It may take one, two, three tries, I'll make it bounded in order to avoid infinitely spinning, but we're at least operating with information that's significantly newer than the one from the uplink messages queue.
I think this is the best option.
Summary
The application server should confirm the end device session (i.e. move
dev.PendingSession
intodev.Session
) on the following events:Why do we need this?
Currently, it is possible that the NS 'switches' the session without the AS knowing this switch occurred. @rvolosatovs reproduced this in
v3.11
using the following sequence.dev.PendingSession
DownlinkQueueInvalidated
event to the ASdev.Session
At this point the AS will never be able to schedule downlinks again in this session unless the NS sends another invalidation some time in the future, because it basically rejected the
FCnt
increase that happened when the NS sent a FPort=0 downlink (and now the FCnt is always too low).What is already there? What do you see now?
The session won't recover unless an invalidation occurs in the future.
What is missing? What do you want to see?
SessionKeyID
toDownlinkQueueInvalidation
. If the queue is empty, the AS cannot know at this moment which queue was invalidated.dev.Session
.nacked
message is from - it is possible that thenacked
message is from a pending session (from the AS perspective) and as such the FCnt update should be done on the correct session.dev.PendingSession
anddev.Session
when the messages mentioned inSummary
occur.Environment
v3.11
How do you propose to implement this?
handleUplink
and do it on all of the appropriate uplink types.DownlinkQueue{Push|Replace}
with the minimumFCnt
and always update theLastAFCntDown
to this value. This would ensure that the system converges if at any point we're for some reason out of sync between AS and NS.How do you propose to test this?
Try to reproduce the sequence mentioned in the reproduction steps.
Can you do this yourself and submit a Pull Request?
Yes, but as this is a non-trivial change I'm asking tagging this issue first as
discussion
- do we want to introduce these changes ? The downlink queue invalidation one seems a requirement, but the other ones are good for consistency.cc @rvolosatovs