rp_callback panic's with Class B enabled

RobTWTG commented 3 months ago

During our duration tests, we observed an issue when we enable class B and send many uplinks, the lbm trace shows this:

[0;32mINFO: add task user
[0m User LoRaWAN tx on FPort 16

  *************************************`
  * Send Payload  for stack_id = 0
  *************************************`

 Tx  LoRa at 1144108 ms: freq:867500000, SF7, BW125, len 19 bytes 0 dBm, fcnt_up 85, toa = 52

ping_slot_obj(0) devaddr:5c41a6 START at 1147409 (1405775900.471), freq:869525000, dr:3, PingNb:26

  *************************************
  *  TX DONE
  *************************************

  Open RX1 for Hook Id = 2  RX1 LoRa at 1145155 ms: freq:867500000, SF7, BW125, sync word = 0x34
  Timer will expire in 976 ms

  *************************************
  * RX1 Timeout for stack_id = 0
  *************************************

  Open RX2 for Hook Id = 2  RX2 LoRa at 1146190 ms: freq:869525000, SF12, BW125, sync word = 0x34
  Timer will expire in 998 ms

  *************************************
  * RX2 Timeout for stack_id = 0
  *************************************

ping_slot_obj(0) devaddr:5c41a6 START at 1151249 (1405775904.311), freq:869525000, dr:3, PingNb:25
ping_slot_obj(0) devaddr:5c41a6 START at 1155089 (1405775908.151), freq:869525000, dr:3, PingNb:24
ping_slot_obj(0) devaddr:5c41a6 START at 1158929 (1405775911.991), freq:869525000, dr:3, PingNb:23
ping_slot_obj(0) devaddr:5c41a6 START at 1162769 (1405775915.831), freq:869525000, dr:3, PingNb:22
[0;32mINFO: add task user
[0m User LoRaWAN tx on FPort 17
  *************************************
  * Send Payload  for stack_id = 0
  *************************************
  Tx  LoRa at 1164303 ms: freq:868100000, SF7, BW125, len 24 bytes 0 dBm, fcnt_up 86, toa = 62
ping_slot_obj(0) devaddr:5c41a6 START at 1166609 (1405775919.671), freq:869525000, dr:3, PingNb:21
  *************************************
  *  TX DONE
  *************************************

  Open RX1 for Hook Id = 2  RX1 LoRa at 1165361 ms: freq:868100000, SF7, BW125, sync word = 0x34
  Timer will expire in 976 ms
  *************************************
  * RX1 Timeout for stack_id = 0
  *************************************

  Open RX2 for Hook Id = 2  RX2 LoRa at 1166396 ms: freq:869525000, SF12, BW125, sync word = 0x34
  Timer will expire in 997 ms
[0;33mWARN:  RP: Aborted task with hook #5 - not a priority task

Here the smtc_modem_hal_on_panic callback gets called and our application trace shows:

[ERR ][LHAL]: function: rp_callback, line: 443
[ERR ][LHAL]: RP_FAILSAFE - #2

It looks like the timeout for RX2 is not handled anymore. Does this have something to do with the priority system of tasks? We did modify the RP_Margin to a higher value. Hopefully the lbm team can provide some answers in our investigation, let us know if we can provide more information or try something.

opeyrard commented 3 months ago

Hi, Could you please precise the RP_Margin you are using ? Do you have something similar to below in your implementation ?

if ( smtc_modem_is_irq_flag_pending( ) == false )  {
            hal_watchdog_reload( );
            hal_mcu_set_sleep_for_ms( MIN( sleep_time_ms, WATCHDOG_RELOAD_PERIOD_MS ) );
}

Many thanks, Best regards

RobTWTG commented 2 months ago

Thank you for replying, it looked like it was an issue on our side how we provided time to smtc_modem_hal_get_time_in_ms, in some rare cases subseconds where passed wrong with 1000ms margin error. That caused this assert with class b enabled.

I will close the issue.

Lora-net / SWL2001

rp_callback panic's with Class B enabled #77