STMicroelectronics / STM32CubeWL

STM32Cube MCU Full FW Package for the STM32WL series - (HAL + LL Drivers, CMSIS Core, CMSIS Device, MW libraries plus a set of Projects running on boards provided by ST (Nucleo boards)
Other
99 stars 52 forks source link

Application hangs because MBMUXIF_LoraSendCmd() command stuck #80

Open metaTinker opened 3 months ago

metaTinker commented 3 months ago

Setup

Application hangs because MBMUXIF_LoraSendCmd() command stuck on Sem_MbLoRaRespRcv sometimes

I have an application built around LoRaWAN_End_Node_DualCoreFreeRTOS example provided in the firmware. My application on CM4 sends telemetry roughly every 4-5 minutes. It will run well for a few days and suddenly the MBMUXIF_LoraSendCmd() gets stuck waiting on Sem_MbLoRaRespRcv. Reading more on how dual-core system works I figured that if a response is not received through the IPCC channels, the semaphore is never released. This is a potential pitfall for me because my application requires telemetry to be sent continuously at the 4/5 minute rate.

I cannot think of reasons why a Resp might not have been received by the CM4 core for any telemetry send Cmd.

How to reproduce the bug

At this time, I cannot pinpoint how to reproduce this bug. In my view it happens randomly at different times. Sometimes the system runs for a few days and the bug occurs or sometimes it happens right away.

Additional context

I have set up an rtos queue to not bombard the send API with messages. However, my queue gets full when this issue and no messages are sent.

Code Snippet

void MBMUXIF_LoraSendCmd(void)
{
  /* USER CODE BEGIN MBMUXIF_LoraSendCmd_1 */

  /* USER CODE END MBMUXIF_LoraSendCmd_1 */
  if (MBMUX_CommandSnd(FEAT_INFO_LORAWAN_ID) == 0)
  {
    osSemaphoreAcquire(Sem_MbLoRaRespRcv, osWaitForever);
  }
  else
  {
    Error_Handler();
  }
  /* USER CODE BEGIN MBMUXIF_LoraSendCmd_Last */

  /* USER CODE END MBMUXIF_LoraSendCmd_Last */
}

Additional Info/questions I think by design this system waits forever on this semaphore. If at all a response is not heard back, can we have some retry mechanism or show it as a communication error callback/ retry mechanism of some kind?

RJMSTM commented 3 months ago

ST Internal Reference: 176222

metaTinker commented 3 months ago

@RJMSTM any updates on this?

metaTinker commented 1 month ago

@ALABSTM @RJMSTM is there any resolution to this?

ALABSTM commented 1 month ago

Hi @metaTinker,

We got the point. We will get back to you when we have updates to share. This may take some time. Thank you for your comprehension.

With regards,