HAL SPI a timeout can underflow causing the app to hang on a loop

STMicroelectronics / STM32CubeL4

STM32Cube MCU Full Package for the STM32L4 series - (HAL + LL Drivers, CMSIS Core, CMSIS Device, MW libraries plus a set of Projects running on all boards provided by ST (Nucleo, Evaluation and Discovery Kits))

Other

259 stars 151 forks source link

HAL SPI a timeout can underflow causing the app to hang on a loop #71

Closed SteveGiann closed 1 year ago

SteveGiann commented 2 years ago

In both the functions SPI_WaitFifoStateUntilTimeout and SPI_WaitFlagStateUntilTimeout there is a variable called tmp_timeout which is calculated by subtracting the Timeout with the time difference between the start of the whole process and the current Tick. tmp_timeout = Timeout - (HAL_GetTick() - Tickstart); line 3921 and line 3992 If the time difference is larger than the Timeout then the tmp_timeout will wrap around and take an extremely large value. This will cause the program to get stuck inside the while loop.

For example if we have a timeout of 5ms and the time difference between the start and the current Tick is 10, then tmp_timeout will get the value 0xfffffffB.

This can be caused by an interrupt triggering before the calculation and which takes longer to execute than the Timeout.

A possible fix would be to make a check if a underflow is possible

if (Timeout < (HAL_GetTick() - Tickstart))
{
    tmp_timeout = 0;
}
else
{
    tmp_timeout = Timeout - (HAL_GetTick() - Tickstart);
}

HBOSTM commented 1 year ago

Hello @SteveGiann,

Would you please give us more details about how you got this issue? Did you observed it at runtime ? if so, could you please share the code you have used to reproduce the problem.

With regards,

SteveGiann commented 1 year ago

Hello!

The problem was observed at runtime when calling the HAL_SPI_Transmit function like this: HAL_SPI_Transmit(hspi, &data, 1, 1);

Unfortunately the issue was probably caused by interrupts so there is no standard way to reproduce it.

In our investigation we saw that the issue begins in line 834 where the tickstart is first assigned. From that assignment until line 974 and the subsequent functions that this function calls, there is a possibility that an interrupt can trigger. If this interrupt takes more than 1ms to be handled, then after reaching the functions stated in the issue, the difference between tickstart and the return value of HAL_GetTick() is larger than the timeout provided. This doesn't necessary need to be one interrupt, but several interrupts that trigger between these two spots, which their combined service time is greater than the timeout.

I understand that 1ms could be very small for a timeout, but I wouldn't expect the program to get stuck in a loop because of this.

Regards

HBOSTM commented 1 year ago

Hello @SteveGiann

I correctly reproduce this situation. There is no problem in the HAL driver, This condition managed in the Timeout management and the program return HAL_TIMEOUT Error.

With Regards,

HBOSTM commented 1 year ago

Hello @SteveGiann

First of all, thank you for this contribution. If this problem is solved from your side, please allow me to close it.

Best Regards,

SteveGiann commented 1 year ago

Hello @HBOSTM

Sorry it took me a while to respond.

I believe that there was a miscommunication about the issue.

Our problem is in the functions stated in the issue description: SPI_WaitFifoStateUntilTimeout and SPI_WaitFlagStateUntilTimeout which are called from the SPI_EndRxTxTransaction inside of HAL_SPI_Transmit.

This is a picture which explains what we observed while investigating the issue. The function will not return HAL_TIMEOUT because due to the underflow, the timeout will be very large and will halt the processor for a long time.

st_hal

Unless we are missing something, the line: tmp_timeout = Timeout - (HAL_GetTick() - Tickstart); has a chance to underflow which may lead to an almost "infinite" while loop.

We observed this issue while using the function HAL_SPI_Transmit(hspi, &data, 1, 1); as stated in the previous comment.

HBOSTM commented 1 year ago

Hi @SteveGiann ,

Thank you for this report. We will get back to you as soon as we analyze it further. This may take some time. Thank you for your comprehension.

With regards,

HBOSTM commented 1 year ago

ST Internal Reference: 148961

HBOSTM commented 1 year ago

Hello @SteveGiann,

Thank you for this contribution. You need to properly manage the priority of interrupts to ensure that SPI is not interrupted. Otherwise, you can switch to use IT or DMA processes. You can also secure the SPI data transfer process in polling mode as follows: /* Hide interrupts that block SPI data processing in polling mode:

/* Hide interrupts that block SPI data processing in query mode. */
__disable_irq();
HAL_SPI_TransmitReceive(...);
/* Re-enable interrupts */
__enable_irq();

The described use case cannot occur if priorities are well managed. Also, the proposed solution cannot solve the problem if the interrupt occurs between the "if" assignment and "tmp_timeout".

So, please allow me to close this issue and thank you again for your contribution.

Best Regards,