STMicroelectronics / STM32CubeH7

STM32Cube MCU Full Package for the STM32H7 series - (HAL + LL Drivers, CMSIS Core, CMSIS Device, MW libraries plus a set of Projects running on all boards provided by ST (Nucleo, Evaluation and Discovery Kits))
https://www.st.com/en/embedded-software/stm32cubeh7.html
Other
490 stars 302 forks source link

Ethernet transmit is hang on osSemaphoreAcquire( TxPktSemaphore ) inside low_level_output() #224

Closed ktrofimov closed 1 year ago

ktrofimov commented 2 years ago

Not really a driver issue, but this driver is designed to work with ethernetif.c together... ethernetif.c is taken from STM32H7 example (https://github.com/STMicroelectronics/STM32CubeH7/blob/master/Projects/STM32H743I-EVAL/Applications/LwIP/LwIP_HTTP_Server_Netconn_RTOS/Src/ethernetif.c) With 10 ICMP packets + 10 TCP connections per second Ethernet hangs after some time (could be any time from 5 minutes to 15 hours)

Ethernet transmit hangs forever on while( osSemaphoreAcquire( TxPktSemaphore, TIME_WAITING_FOR_INPUT ) != osOK ):

static err_t low_level_output(struct netif *netif, struct pbuf *p)
{
 . . .
 if( HAL_ETH_Transmit_IT(&heth, &TxConfig ) != HAL_OK )
 {
    printf( "HAL ETH Tx IT Error\n" );
 }
 while( osSemaphoreAcquire( TxPktSemaphore, TIME_WAITING_FOR_INPUT ) != osOK )
    ;
}

This semaphore supposed to be cleared inHAL_ETH_TxCpltCallback():

void HAL_ETH_TxCpltCallback(ETH_HandleTypeDef *heth)
{
  osSemaphoreRelease(TxPktSemaphore);
}
ktrofimov commented 2 years ago

Update 1: Looks like same issue was with Zephyr where proposed solution (https://github.com/zephyrproject-rtos/zephyr/pull/29944/commits/8ed12aecb9a2311e183b833e563601f527a1590c) was:

 __DSB();
 if( HAL_ETH_Transmit_IT(&heth, &TxConfig ) != HAL_OK )
 __DSB();
 __ISB();

but this solution did not fix this issue. Ethernet still hangs (last time it was after after 22 minutes of uptime / 10 000 pings + 9 500 TCP connections)

ktrofimov commented 2 years ago

Possible solutions (has to be tested) - add code to release TxPktSemaphore in case of DMA Tx error:

void HAL_ETH_ErrorCallback(ETH_HandleTypeDef *heth)
{
  if((HAL_ETH_GetDMAError(heth) & ETH_DMACSR_RBU) == ETH_DMACSR_RBU)
  {
    // ETH DMA Rx Error
    osSemaphoreRelease(RxPktSemaphore);
  }

+  if((HAL_ETH_GetDMAError(heth) & ETH_DMACSR_TBU) == ETH_DMACSR_TBU)
+  {
+   // ETH DMA Tx Error\n" );
+    osSemaphoreRelease(TxPktSemaphore);
+  }

}
ASELSTM commented 2 years ago

ST Internal Reference: 130930

ktrofimov commented 2 years ago

Just stumbled upon the same issue again. Now in void ethernet_link_thread(void* argument) Steps to reproduce: 1) pull cable out 2) disconnect or disable DHCP server 3) plug cable in 4) pull cable out again 5) plug it in again 6) pause for about 15..20 seconds and connect or enable DHCP server somewhere in this sequence ethernet_link_thread hangs on xQueueSemaphoreTake():

ethernet_link_thread_hung

Notice the difference in ST code (HAL_ETH_Start() instead of HAL_ETH_Start_IT() ) - it was already mentioned in https://github.com/STMicroelectronics/STM32CubeF4/issues/120:

if(linkchanged)
{
      /* Get MAC Config MAC */
      HAL_ETH_GetMACConfig(&heth, &MACConf);
      MACConf.DuplexMode = duplex;
      MACConf.Speed = speed;
      HAL_ETH_SetMACConfig(&heth, &MACConf);
      HAL_ETH_Start(&heth);
      netif_set_up(netif);
      netif_set_link_up(netif);
}

but HAL_ETH_Stop_IT() :

if(netif_is_link_up(netif) && (PHYLinkState <= LAN8742_STATUS_LINK_DOWN))
{
    HAL_ETH_Stop_IT(&heth);
    netif_set_down(netif);
    netif_set_link_down(netif);
}

while my code have xxx_IT in both places (I feel this is a correct way), but it doesn't help.

ktrofimov commented 2 years ago

Most interesting, that despite of that hung, DHCP client is able to receive address and the rest firmware works as supposed to be. This semaphore bug affects eth_link_thread() only.

This particular problem was solved by editing lwIP dhcp.c:

dhcp_discover(struct netif *netif)
{
+  if( !netif_is_link_up( netif ) )
+     return ERR_CONN;
  . . .

but I have a feeling this is a mutithread problem. Probably calling netif_set_up(netif); and netif_set_link_up(netif); from the wrong thread.

fishjimi commented 2 years ago

I had the same problem today. After debugging, I also found the problem here. This can be solved simply

if(linkchanged)
{
  /* Get MAC Config MAC */
  HAL_ETH_GetMACConfig(&heth, &MACConf);
  MACConf.DuplexMode = duplex;
  MACConf.Speed = speed;
  HAL_ETH_SetMACConfig(&heth, &MACConf);
  //HAL_ETH_Start(&heth);
  HAL_ETH_Start_IT(&heth);  //change to xxx_IT
  netif_set_up(netif);
  netif_set_link_up(netif);
}
a1292999652 commented 1 year ago

The same problem still exists today, when will it be fixed? 微信截图_20220921213429 微信截图_20220921213455

a1292999652 commented 1 year ago

我今天遇到了同样的问题。 经过调试,我也发现了这里的问题。 这个可以简单解决

if(linkchanged)
{
  /* Get MAC Config MAC */
  HAL_ETH_GetMACConfig(&heth, &MACConf);
  MACConf.DuplexMode = duplex;
  MACConf.Speed = speed;
  HAL_ETH_SetMACConfig(&heth, &MACConf);
  //HAL_ETH_Start(&heth);
  HAL_ETH_Start_IT(&heth);  //change to xxx_IT
  netif_set_up(netif);
  netif_set_link_up(netif);
}

But every time the code is generated, it goes back to the original

pavel-a commented 1 year ago

As suggested on the forum, the reason is incorrect use of LwIP API (or use of obsolete examples).

The affected examples under Projects/ should be fixed.

ASELSTM commented 1 year ago

Hi @ktrofimov,

Thank you for your contribution. This issue has been fixed in the frame of version v1.11.0 of the STM32CubeH7. Please allow me then to close this thread.

With regards,