RX window timing depends on LoRaMacProcess call timing

matthijskooijman commented 2 years ago

Since commit 60fb1b3b4c1eb62c8ee197ab9849e9a192195052 the RX window timers are no longer started in the TxDone interrupt handler, but deferred to ProcessRadioTxDone() (called by LoRaMacProcess() from the main loop). This means there is an arbitrary delay (that depends on how often the application calls LoRaMacProcess(), but is necessarily non-zero) between the TxDone interrupt and the timers being started by ProcessRadioTxDone(). However, the timers are started as follows:

https://github.com/Lora-net/LoRaMac-node/blob/2fdce5f28e2d1f03805287614a6604dfcbe7a557/src/mac/LoRaMac.c#L823-L826

So relative to the moment of starting the timer, not relative to the TxDone timing, so the window is always more or less later than intended.

I have seen this happen in practice with the STM32CubeWL stack, which is a port of LoRaMac-node and also contains this same code. Debug output shows that LoRaMacProcess() is called about 20ms after TxDone, and the RX window is late by the same amount of time. I have not verified exact timing with a scope (nor tested on this upstream code - I'm not sure if I have any hardware available for that), but the problem seems clear enough from looking at the code to report it already now.

This is probably easy to fix: Just set the timer relative to TxDoneParams.CurTime (which is set by the TxDone ISR already) instead of relative to now. Since the timer API only seems to support setting timers relative to now (there is no way to pass an absolute time), I guess this means something like:

 uint32_t offset = TimerGetCurrentTime() - TxDoneParams.CurTime;
 TimerSetValue( &MacCtx.RxWindowTimer1, MacCtx.RxWindow1Delay - offset ); 
 TimerStart( &MacCtx.RxWindowTimer1 ); 
 TimerSetValue( &MacCtx.RxWindowTimer2, MacCtx.RxWindow2Delay - offset); 
 TimerStart( &MacCtx.RxWindowTimer2 );

I think the above code should also be ran atomically (All ISRs disabled) to ensure no significant time passes between the above call to TimerGetCurrentTime() and the moment the timer code makes the timestamp absolute: https://github.com/Lora-net/LoRaMac-node/blob/2fdce5f28e2d1f03805287614a6604dfcbe7a557/src/system/timer.c#L126-L127

This atomicity could be prevented if the timer code gained an interface to start a timer for an absolute instead of relative time.

descartes commented 1 year ago

Would you like me to send you something based on STM32WL?

matthijskooijman commented 1 year ago

@descartes, I'm not sure I follow your offer? I already have code from the STM32CubeWL package working and applied the fix suggested above, which indeed seems to work to fix the problem (I haven't solved atomicity though). I've also confirmed the problem actually exists, by inserting a 100ms delay before each call to LoRaMacProcess()/

descartes commented 1 year ago

I was referring to any hardware you need - boards with STM32WL yes, new oscilloscope not so much.

matthijskooijman commented 1 year ago

Ah, I have 2 Nucleo STM32WL55JC boards, so I'm good :-)

What would be helpful, I think, is for someone to reproduce this issue (and try the fix) with the original LoRaMac-node code (instead of the STM32CubeWL code which is modified by ST). I'm lacking hardware, a test setup and time for that...

mluis1 commented 1 year ago

@matthijskooijman Thanks for the finding and fix proposition.

Following your recommendation I have applied below patch

diff --git a/src/mac/LoRaMac.c b/src/mac/LoRaMac.c
index edb172cc..3a102afe 100644
--- a/src/mac/LoRaMac.c
+++ b/src/mac/LoRaMac.c
@@ -810,11 +810,15 @@ static void ProcessRadioTxDone( void )
     {
         Radio.Sleep( );
     }
+
     // Setup timers
-    TimerSetValue( &MacCtx.RxWindowTimer1, MacCtx.RxWindow1Delay );
+    CRITICAL_SECTION_BEGIN( );
+    uint32_t offset = TimerGetCurrentTime( ) - TxDoneParams.CurTime;
+    TimerSetValue( &MacCtx.RxWindowTimer1, MacCtx.RxWindow1Delay - offset );
     TimerStart( &MacCtx.RxWindowTimer1 );
-    TimerSetValue( &MacCtx.RxWindowTimer2, MacCtx.RxWindow2Delay );
+    TimerSetValue( &MacCtx.RxWindowTimer2, MacCtx.RxWindow2Delay - offset );
     TimerStart( &MacCtx.RxWindowTimer2 );
+    CRITICAL_SECTION_END( );

     if( MacCtx.NodeAckRequested == true )
     {

I have run a quick test using a NucleoL476+SX1276MB1LAS platform. To perform the quick test I have used the LCTT tool (LoRa-Alliance certication tool) and run the test verifying RXTimingSetupReq MAC command. The test ran successfully. Please refer to rx_timing_setup_req_test_log.zip for the test tool report.

We will push the fix as soon as we have finalized the review process and run further tests.

matthijskooijman commented 1 year ago

Cool, that matches the fix I applied to the STM32CubeWL code as well (except I omitted the critical section for now).

Did you also confirm the problem by adding some artificial delay? Easiest is probably to just add a delay inside LoRaMacProcess at the start, something like 100ms (must not be too small, since there is already a fixed 20ms margin in the RX timing, and at higher SF's the margin is even more, so I mainly tested at SF7).

About the critical section - I haven't measured, but my intuition is that this produces a fairly long criticial section that would add problematic latency for some kinds of interrupts. I would suggest, as an additional improvement, considering to revise the Timer API adding something like TimerStartAbs() or so where you pass an absolute timestamp (and ignore the value passed to TimerSetValue). If this would be available, using it could remove the need for the critical section completely AFAICS. I'm not sure how the timer util API is maintained and developed, maybe changing that is a more complicated process, but it looks like this would be a change in timer.c/h only, without needing changes to the underlying Rtc*() API.

mluis1 commented 1 year ago

Please find in following zip the result for some Rx windows tests. A 100 ms delay has been added just before the CRITICAL_SECTION_BEGIN call. The performed tests verify all supported data rates. rx_window_test_log_with_100_ms_delay.zip

The time spent inside the critical section will depend on the number of timers that have been added to the linked list. At a given time I think that the number of timers in the list will be small. However, this will depend deeply on the application being built.

For the moment it should be quite safe to keep the critical section.

Adding a new API allowing to start a timer with absolute time is indeed a good idea for this kind of operation. As we are getting closer to execute v4.7.0 release we will postpone such API addition to a future release. In order to not forget to add this API we will create an issue under our internal Git server as a reminder.

matthijskooijman commented 1 year ago

For the moment it should be quite safe to keep the critical section.

Adding a new API allowing to start a timer with absolute time is indeed a good idea for this kind of operation. As we are getting closer to execute v4.7.0 release we will postpone such API addition to a future release.

Agreed, and sounds good. Thanks!

slavendam commented 7 months ago

@matthijskooijman Did you have issues similar to this: https://github.com/stm32duino/STM32LoRaWAN/issues/38 Can it be that some value overflows when IRQ handler/timer is set? Or maybe something connected with RTC timer? I did lot of debuging, but don't have an idea what can be the source of issue...

matthijskooijman commented 7 months ago

@slavendam Hm, you make a good point. When ProcessRadioTxDone() (through LoRaMacProcess(), or modem.maintain() in the ST32Duino library) is called very late (i.e. >= 1s after the interrupt), offset could end up being more than MacCtx.RxWindow1Delay, resulting in an underflow in this line:

 TimerSetValue( &MacCtx.RxWindowTimer1, MacCtx.RxWindow1Delay - offset );

Which would lead to effectively indefinitely postponing the RX1 handler.

@slavendam Could it be that your sketch is indeed not calling modem.maintain() for a long time?

You could confirm that this is the case, by printing the value of MacCtx.RxWindow1Delay and offset here.

@mluis1 I wonder if it would be worthwhile to catch and handle this case (by cancelling the timers and wrapping up the transaction, I guess? Could be done after the critical section)? Even though it is not really useful to support applications that call LoRaMacProcess() so seldomly, if an application accidentally delays calling it on a rare occasion, it would be good if the stack did not freeze up...

matthijskooijman commented 7 months ago

From the followup in the other issue, it turns out that the issue @slavendam is seeing is not caused by offset underflowing. I do still think the underflow issue exists, but it is probably obscure enough to just ignore it. I'll leave that here for consideration.

Lora-net / LoRaMac-node

RX window timing depends on LoRaMacProcess call timing #1349