Lora-net / LoRaMac-node

Reference implementation and documentation of a LoRa network node.
Other
1.87k stars 0 forks source link

Confirmed uplinks eventually breaks channel plan and data rate (4.4.7) #1211

Closed lancepitka closed 2 years ago

lancepitka commented 2 years ago

I have encountered a problem when using confirmed uplinks where after an unpredictable amount of time (anywhere from 10 to a few hundred uplinks), my node will fail to receive mac commands from my server.

This is initially seen by observing 7 transmission retry errors following a successful uplink, indicating the ACK is not received my by node, and it retries transmitting 7 more times on the same sub band since the channel plan was already set. Following this, my node loses its channel plan. I still receive a lot of uplinks since it still retries 8 times, but many uplinks are lost (about 25%) and the occasional retransmission error is seen. I have confirmed with my spectrum analyzer that uplinks are being transmitted outside of sub band used by my server. After the node begins this strange behavior, it never corrects itself.

Here is an image from ChirpStack showing the errors, occasional missed uplinks, and randomness of data rate being used. This happens both when ADR is and is not enabled. image

My node is in the same room as my gateway so I can confirm that signal strength is not the issue. I have reproduced this error with both Chirpstack and OrbiWan network servers.

Steps to preproduce:

lancepitka commented 2 years ago

Update:

I discovered the underlying problem here. When the node enters this state, it still receives downlink mac commands from the server, but does not implement them. For example, it receives the channel plan, requested DR and TX power, but ignores them. I am investigating further.

lancepitka commented 2 years ago

Final Update:

I discovered the root cause of the problem. What happens is if the node send 8 confirmed uplinks in a row without receiving an ACK, it resets its channel mask back to the default of all 8 sub bands enabled.

The next time the node successfully transmits to the network (happens to transmit on a supported sub band), the network server sends a LinkAdrReq command to set the channel mask on the node. In most cases, there are two options for how the LinkAdrReq command sends the channel mask, as defined in Section 2.3.5 of Regional Parameter 1.0.3A specification (for US915 region).: image

If ChMaskCntl == 5, it can turn on or off each sub band as a whole, and in one downlink it can therefore define the entire entire channel mask at once. However, the alternative (which Chirpstack uses for example), is to use ChMaskCntl 0 to 4, in which case it can set the mask for each channel individually, but it only defines two sub bands at a time.

When the latter is used after the node resets its own channel plan due to missed confirmed uplinks, only 2 (or in some cases 4 if multiple ChMaskCntl fields are set in one downlink) sub bands are re-defined, and all remaining sub bands are left unchanged and still enabled. This leaves the node in a permanent state of having more channels enabled than the network supports, leading to data loss, as well as the behavior seen in my original post.

Is the best solution here to remove the code that resets the node's channel plan in the case of the missed confirmed uplinks?