STMicroelectronics / stm32h7xx_hal_driver

Provides the STM32Cube MCU Component "hal_driver" of the STM32H7 series.
BSD 3-Clause "New" or "Revised" License
96 stars 41 forks source link

HAL_ETH_ReleaseTxPacket() does not test gState and races with itself and TX functions when accessing TX descriptors and counters #72

Open KarstenHohmeier opened 2 weeks ago

KarstenHohmeier commented 2 weeks ago

HAL_ETH_ReleaseTxPacket() does NOT check gState before execution AT ALL.

https://github.com/STMicroelectronics/stm32h7xx_hal_driver/blob/ceda3ceeca2ee77a76d2de2ee6b560ad87c90a48/Src/stm32h7xx_hal_eth.c#L1400-L1413

It just runs and modifies TX descriptors, decrements buffer counters and reads state from &heth into local variables. This buffered state can become stale/outdated if HAL_ETH_ReleaseTxPacket() gets preempted and other threads issuing calls to HAL_ETH_Transmit_IT() or HAL_ETH_Transmit() get scheduled instead.

Among other things HAL_ETH_ReleaseTxPacket() contains the decrement of dmatxdesclist->BuffersInUse.

https://github.com/STMicroelectronics/stm32h7xx_hal_driver/blob/ceda3ceeca2ee77a76d2de2ee6b560ad87c90a48/Src/stm32h7xx_hal_eth.c#L1477

for which the increment has been guarded by a critical section

https://github.com/STMicroelectronics/stm32h7xx_hal_driver/blob/ceda3ceeca2ee77a76d2de2ee6b560ad87c90a48/Src/stm32h7xx_hal_eth.c#L3333-L3340

So somebody suspected there might be race conditions there but failed to follow through ;)

You really, really, really need to lock this properly with atomic Test-and-Set of gState, because even the read-modify-write of TX descriptor OWN bits is potentially racy (see #71 #70 and #69)

This is not a theoretical bug! Under high incoming packet load (~8000 packets per second RX) there are frequent preemptions by the input thread which runs with realtime priority (code generated by CubeIDE).

grafik

This causes the TX calls and TX packet releases in other threads to be preempted and scheduled randomly and thrashes the data in &heth to a point where the system would not send TX packets any more. Under low packet load this can still occur and leads to unstable TX operation where systems will stop transmitting data after multiple days when these race conditions randomly occur but with much lower probability.

ASEHSTM commented 1 week ago

ST Internal Reference: 191088