STMicroelectronics / STM32CubeWL

STM32Cube MCU Full FW Package for the STM32WL series - (HAL + LL Drivers, CMSIS Core, CMSIS Device, MW libraries plus a set of Projects running on boards provided by ST (Nucleo boards)
Other
99 stars 52 forks source link

Radio IRQ during HAL_SUBGHZ_ExecSetCmd causes infinite Radio IRQ handler re-entering #43

Closed RomanJasmann closed 1 year ago

RomanJasmann commented 1 year ago

Describe the set-up

Describe the bug (skip if none)

If a Radio IRQ occurs during HAL_SUBGHZ_ExecSetCmd, it will cause an infinite re-entering of the Radio IRQ handler.

In LoRaWAN class C, the transceiver operates most of the time in Rx. When the transceiver is in Rx and the application requests an uplink, the LoRaWAN stack will serve the uplink request by first stopping the Rx mode and setting the transceiver in standby mode by calling RadioStandby in RadioSetTxConfig. RadioStandby will call SUBGRF_SetStandby and SUBGRF_SetStandby will call HAL_SUBGHZ_ExecSetCmd. HAL_SUBGHZ_ExecSetCmd will lock the subghz handle by __HAL_LOCK(hsubghz). If a Radio IRQ occurs during HAL_SUBGHZ_ExecSetCmd, e.g. triggered by reception of a packet, then the SUBGHZ_Radio_IRQHandler will be called. The SUBGHZ_Radio_IRQHandler will call HAL_SUBGHZ_IRQHandler. The HAL_SUBGHZ_IRQHandler will try to retrieve the interrupt source from the transceiver by calling HAL_SUBGHZ_ExecGetCmd, but HAL_SUBGHZ_ExecGetCmd will return immediately since it cannot lock the subghz handle. The subghz handle has already been locked by HAL_SUBGHZ_ExecSetCmd. At the end of HAL_SUBGHZ_IRQHandler, the handler will try to clear the interrupt request by calling HAL_SUBGHZ_ExecSetCmd, but HAL_SUBGHZ_ExecSetCmd will also return immediately since it cannot lock the subghz handle. The result is that the radio IRQ won't be cleared, causing a re-entering of SUBGHZ_Radio_IRQHandler immediately after exiting it.

Screenshots

1

2

3

4

Best regards Roman Jasmann

ASELSTM commented 1 year ago

ST Internal Reference: 134090

ASELSTM commented 1 year ago

Hi @RomanJasmann,

For a better analysis of your issue, would you please give us more details about the version of STM32CubeWL package you are using ?

With regards,

RomanJasmann commented 1 year ago

Hi,

I'm using STM32CubeWL V1.0.0.

Best regards Roman Jasmann

ASELSTM commented 1 year ago

Hi @RomanJasmann,

Would you please try with the latest version v1.2.0 of the STM32CubeWL package available on the ST.com.

With regards,

RomanJasmann commented 1 year ago

@ASELSTM

Hi, the implementation of __HAL_LOCK in v1.2.0 is the same as in v1.0.0. So the problem will also occur with v1.2.0. The root cause of the problem is an incorrect implementation of __HAL_LOCK. See this discussion:

https://community.st.com/s/question/0D50X0000C5Tns8SQC/bug-stm32-hal-driver-lock-mechanism-is-not-interrupt-safe

Here is some additional description:

  1. When the device is running in class C mode, it opens a continous Rx window with the following PHY layer downlink configuration (note the inverted IQ polarization):

grafik

  1. As soon as the application of the device requests the LoRaWAN stack to send an uplink, the stack will check the request and serve the request by switching from Rx to Tx in RegionEU868TxConfig. In RegionEU868TxConfig, the stack will primary compute the Tx PHY layer configuration (2.1), set the radio frequency to the next Tx channel frequency (2.2), and apply the computed PHY layer config to the radio (2.3). Note that the LoRaWAN stack does not set the radio in standby mode before setting the radio frequency to the next Tx uplink channel frequency (2.2) and applying the uplink Tx PHY layer configuration (2.3). This means that the radio will keep a downlink PHY layer configuration and stay in Rx until Radio.SetTxConfig (or more precisely, the SPI transaction to set the radio in Tx mode associated with Radio.SetTxConfig) has not finished.

grafik

Note also that Radio.Rx enables all radio interrupts and not just IRQ_RX_DONE and IRQ_RX_TX_TIMEOUT whereas Radio.Send only enables IRQ_TX_DONE and IRQ_RX_TX_TIMEOUT. So even an IRQ_PREAMBLE_DETECTED could be requested from the radio.

grafik grafik

  1. Radio.SetTxConfig will lead to a call of HAL_SUBGHZ_ExecSetCmd. HAL_SUBGHZ_ExecSetCmd will call __HAL_LOCK(hsubghz) and thereby allocate the hsubghz handle. Remember that at this point the radio is still in Rx mode. If a radio interrupt occurs now between __HAL_LOCK(hsubghz) (3.1) and the point where the MCU issues the radio SPI command by driving NSS high (3.2), then it will create a deadlock and cause an infinite radio IRQ handler re-entering.

grafik

  1. The reason for the deadlock is that SUBGHZ_Radio_IRQHandler will call HAL_SUBGHZ_IRQHandler. The HAL_SUBGHZ_IRQHandler will try to retrieve the interrupt source from the radio by calling HAL_SUBGHZ_ExecGetCmd (4.1), but HAL_SUBGHZ_ExecGetCmd will return HAL_BUSY since the subghz handle has already been allocated by __HAL_LOCK in HAL_SUBGHZ_ExecSetCmd (3.1).

grafik

At the end of HAL_SUBGHZ_IRQHandler, the handler will try to clear the interrupt request by calling HAL_SUBGHZ_ExecSetCmd (4.2), but HAL_SUBGHZ_ExecSetCmd will also return HAL_BUSY. The result is that the radio IRQ won't be cleared, causing a re-entering of SUBGHZ_Radio_IRQHandler immediately after exiting it.

grafik

Best regards Roman Jasmann

Cartwrar commented 1 year ago

Hi Roman,

Thank you for providing such a clear explanation. I have encountered what I think is the same issue and it was very helpful being able to follow your logic through.

Have you found a reliable fix for this? My device is operating in Class A but still hits a fault at exactly the same point, as the HAL_SUBGHZ_ExecSetCmd is run to set the SPI NSS high. I'm struggling to find a good way around it!

Annoyingly this wasn't occuring when using STMCUBEIDE v1.10.0, it's only after updating I'm having the issue but I can't see any clear difference in generated files from before and after the update. Thank you for your time.

All the best, Alex

ASELSTM commented 1 year ago

Hi @RomanJasmann,

Would you please try to test with the latest available version of STM32CubeWL and check whether you are still having the same issue or not.

With regards,

Adamzat commented 1 year ago

Hi, We are facing the same issue when using STM32CubeWL 1.3.0. It happens frequently when using ABP mode. @RomanJasmann did you resolve the problem.

Kind regards.

YoannBo commented 1 year ago

Hi,

In Middlewares\Third_Party\SubGHz_Phy\stm32_radio_driver\ radio_driver.c, the HAL_SUBGHZ is always called within Critical Section to forbid IRQs within the Call e.g:

void SUBGRF_WriteRegisters( uint16_t address, uint8_t *buffer, uint16_t size )
{
    CRITICAL_SECTION_BEGIN();
    HAL_SUBGHZ_WriteRegisters( &hsubghz, address, buffer, size );
    CRITICAL_SECTION_END();
}

void SUBGRF_ReadRegisters( uint16_t address, uint8_t *buffer, uint16_t size )
{
    CRITICAL_SECTION_BEGIN();
    HAL_SUBGHZ_ReadRegisters( &hsubghz, address, buffer, size );
    CRITICAL_SECTION_END();
}

void SUBGRF_WriteBuffer( uint8_t offset, uint8_t *buffer, uint8_t size )
{
    CRITICAL_SECTION_BEGIN();
    HAL_SUBGHZ_WriteBuffer( &hsubghz, offset, buffer, size );
    CRITICAL_SECTION_END();
}

void SUBGRF_ReadBuffer( uint8_t offset, uint8_t *buffer, uint8_t size )
{
    CRITICAL_SECTION_BEGIN();
    HAL_SUBGHZ_ReadBuffer( &hsubghz, offset, buffer, size );
    CRITICAL_SECTION_END();
}

void SUBGRF_WriteCommand( SUBGHZ_RadioSetCmd_t Command, uint8_t *pBuffer,
                                        uint16_t Size )
{
    CRITICAL_SECTION_BEGIN();
    HAL_SUBGHZ_ExecSetCmd( &hsubghz, Command, pBuffer, Size );
    CRITICAL_SECTION_END();
}

void SUBGRF_ReadCommand( SUBGHZ_RadioGetCmd_t Command, uint8_t *pBuffer,
                                        uint16_t Size )
{
    CRITICAL_SECTION_BEGIN();
    HAL_SUBGHZ_ExecGetCmd( &hsubghz, Command, pBuffer, Size );
    CRITICAL_SECTION_END();
}
RomanJasmann commented 1 year ago

@YoannBo

You are right. Thank you for the hint. They have changed the implementation in v1.3.0. Compare

STM32CubeWL V1.0.0:

https://github.com/STMicroelectronics/STM32CubeWL/blob/v1.0.0/Middlewares/Third_Party/SubGHz_Phy/stm32_radio_driver/radio_driver.c

#define SUBGRF_WriteCommand( x, y, z )  HAL_SUBGHZ_ExecSetCmd( &hsubghz, (x), (y), (z) )
#define SUBGRF_ReadCommand( x, y, z )   HAL_SUBGHZ_ExecGetCmd( &hsubghz, (x), (y), (z) )

void SUBGRF_WriteRegister( uint16_t addr, uint8_t data )
{
    HAL_SUBGHZ_WriteRegisters( &hsubghz, addr, (uint8_t*)&data, 1 );
}

uint8_t SUBGRF_ReadRegister( uint16_t addr )
{
    uint8_t data;
    HAL_SUBGHZ_ReadRegisters( &hsubghz, addr, &data, 1 );
    return data;
}

void SUBGRF_WriteRegisters( uint16_t address, uint8_t *buffer, uint16_t size )
{
    HAL_SUBGHZ_WriteRegisters( &hsubghz, address, buffer, size );
}

void SUBGRF_ReadRegisters( uint16_t address, uint8_t *buffer, uint16_t size )
{
    HAL_SUBGHZ_ReadRegisters( &hsubghz, address, buffer, size );
}

void SUBGRF_WriteBuffer( uint8_t offset, uint8_t *buffer, uint8_t size )
{
    HAL_SUBGHZ_WriteBuffer( &hsubghz, offset, buffer, size );
}

void SUBGRF_ReadBuffer( uint8_t offset, uint8_t *buffer, uint8_t size )
{
    HAL_SUBGHZ_ReadBuffer( &hsubghz, offset, buffer, size );
}

STM32CubeWL V1.3.0:

https://github.com/STMicroelectronics/STM32CubeWL/blob/v1.3.0/Middlewares/Third_Party/SubGHz_Phy/stm32_radio_driver/radio_driver.c

void SUBGRF_WriteRegister( uint16_t addr, uint8_t data )
{
    HAL_SUBGHZ_WriteRegisters( &hsubghz, addr, (uint8_t*)&data, 1 );
}

uint8_t SUBGRF_ReadRegister( uint16_t addr )
{
    uint8_t data;
    HAL_SUBGHZ_ReadRegisters( &hsubghz, addr, &data, 1 );
    return data;
}

void SUBGRF_WriteRegisters( uint16_t address, uint8_t *buffer, uint16_t size )
{
    CRITICAL_SECTION_BEGIN();
    HAL_SUBGHZ_WriteRegisters( &hsubghz, address, buffer, size );
    CRITICAL_SECTION_END();
}

void SUBGRF_ReadRegisters( uint16_t address, uint8_t *buffer, uint16_t size )
{
    CRITICAL_SECTION_BEGIN();
    HAL_SUBGHZ_ReadRegisters( &hsubghz, address, buffer, size );
    CRITICAL_SECTION_END();
}

void SUBGRF_WriteBuffer( uint8_t offset, uint8_t *buffer, uint8_t size )
{
    CRITICAL_SECTION_BEGIN();
    HAL_SUBGHZ_WriteBuffer( &hsubghz, offset, buffer, size );
    CRITICAL_SECTION_END();
}

void SUBGRF_ReadBuffer( uint8_t offset, uint8_t *buffer, uint8_t size )
{
    CRITICAL_SECTION_BEGIN();
    HAL_SUBGHZ_ReadBuffer( &hsubghz, offset, buffer, size );
    CRITICAL_SECTION_END();
}

void SUBGRF_WriteCommand( SUBGHZ_RadioSetCmd_t Command, uint8_t *pBuffer,
                                        uint16_t Size )
{
    CRITICAL_SECTION_BEGIN();
    HAL_SUBGHZ_ExecSetCmd( &hsubghz, Command, pBuffer, Size );
    CRITICAL_SECTION_END();
}

void SUBGRF_ReadCommand( SUBGHZ_RadioGetCmd_t Command, uint8_t *pBuffer,
                                        uint16_t Size )
{
    CRITICAL_SECTION_BEGIN();
    HAL_SUBGHZ_ExecGetCmd( &hsubghz, Command, pBuffer, Size );
    CRITICAL_SECTION_END();
}

Of course, the critical sections will eliminate the problem.

@ASELSTM You can close the issue.

RomanJasmann commented 1 year ago

@ASELSTM

HAL_SUBGHZ_WriteRegisters and HAL_SUBGHZ_ReadRegisters in SUBGRF_WriteRegister and SUBGRF_ReadRegister must be protected by a critical section too:

https://github.com/STMicroelectronics/STM32CubeWL/blob/main/Middlewares/Third_Party/SubGHz_Phy/stm32_radio_driver/radio_driver.c

void SUBGRF_WriteRegister( uint16_t addr, uint8_t data )
{
    HAL_SUBGHZ_WriteRegisters( &hsubghz, addr, (uint8_t*)&data, 1 );
}

uint8_t SUBGRF_ReadRegister( uint16_t addr )
{
    uint8_t data;
    HAL_SUBGHZ_ReadRegisters( &hsubghz, addr, &data, 1 );
    return data;
}