Lora-net / LoRaMac-node

Reference implementation and documentation of a LoRa network node.
Other
1.87k stars 0 forks source link

SX 126x RadioRandom does not disable radio interupts #841

Closed prouschal closed 4 years ago

prouschal commented 4 years ago

In src/radio/sx126x/radio.c the function RadioRandom is documented to "disable all interrupts". It does not do that, however.

I compared the implementation with the one from the SX1276, which does seem to disable the radio interrupts before entering reception mode.

The reason I am asking is that I observed very stange behaviour in my application while debugging an application layer bug that caused repeated join attempts even after a successful join.

During a Join, Radio.Random() is used to create a nonce for the join attempt. For some reason attempting to join after having already joined causes a strange chain of events:

The radio is put in reception mode without timeout for random number generation. Then a radio interrupt is generated, and the IRQ status is read. The status is read and it is IRQ_RX_TX_TIMEOUT (it is not clear why, as the radio should have been started without timeout). Since the radio is in rxmode, the callback for rx timeout is called.

Now the LoRaWAN stack is not equipped to handle an RX timeout in this state when the RX-Window is "None", as no RX window was opened yet. By default the stack assumes that RX window 2 timed out.

This RX2 timeout immediately terminates the join attempt on the next call to LoRaMACProcess and makes it fail with RX2 timeout.

I do not understand how a radio interrupt with the timeout flags set is even generated at this moment, but the effect is certainly not proper operation. The inconsistency with the documentation is making this even more suspicious.

There seems to be some kind of race condition involved, as adding a large number of debug prints over the serial wire viewer seemed to change the behaviour.

I tried to add the following command to the start of the RadioRandom function:

SX126xSetDioIrqParams( IRQ_RADIO_NONE,

IRQ_RADIO_NONE,

IRQ_RADIO_NONE,

IRQ_RADIO_NONE );

Initial experiments suggest that this may fix the problem, but without understanding the root cause I cannot be confident about that.

However, I have a few questions:

Why is a timeout interrupt generated, when the radio is put in reception without timeout?

Why is the radio put in single reception mode in the random function? What if some packet is received (probably not even intended for the device)?

Wouldn't the reception stop at that point making the RSSI values useless for random number generation? Shouldnt the radio be put in Rx Continuous mode for generating random data? This might even be a security risk if all an attacker has to do to compromise the random number generator is sending a LoRa packet at the right time.

The SX1261 also has its own random number register and reading that is implemented in the driver. Why is it not used for the RadioRandom function?

mluis1 commented 4 years ago

Indeed the Radio interrupts must be disabled.

When we first implemented this functionality the datasheet didn't provide documentation for the API to get a random number from the radio hardware. This is why we tried to implement something similar to the SX127x radios. The issue is that we didn't thought to update this since then.

While implementing the fix for this issue we noticed that the datasheet doesn't really describe how to correctly read the random number from the radio. A datasheet update should take place in near future to include such description.