STMicroelectronics / STM32CubeH7

STM32Cube MCU Full Package for the STM32H7 series - (HAL + LL Drivers, CMSIS Core, CMSIS Device, MW libraries plus a set of Projects running on all boards provided by ST (Nucleo, Evaluation and Discovery Kits))
https://www.st.com/en/embedded-software/stm32cubeh7.html
Other
490 stars 302 forks source link

[SOLVED] New HAL Ethernet driver stops operating with ETH_DMACSR_RBU Ethernet DMA error #222

Closed ktrofimov closed 2 years ago

ktrofimov commented 2 years ago

Custom board with STM32H734IIT6 MCU, STM32CubeIDE Version: 1.9.0, Build: 12015_20220302_0855 Compiler: GNU Tools for STM32 (9-2020-q2-update)

After reading documents about how to run Ethernet on H7 series I am able to have my Ethernet working, but now I do facing an issue I can't solve. Stack can work for some time, but suddenly board stops sending data. Error callback is fired:

void HAL_ETH_ErrorCallback(ETH_HandleTypeDef *heth)
{
  if((HAL_ETH_GetDMAError(heth) & ETH_DMACSR_RBU) == ETH_DMACSR_RBU)
  {
    printf( "ETH DMA Rx Error\n" );
    osSemaphoreRelease(RxPktSemaphore);
    // Clear RBUS ETHERNET DMA flag
    heth->Instance->DMACSR = ETH_DMACSR_RBU;
    // Resume DMA reception. The register doesn’t care what you write to it.
    heth->Instance->DMACRDTPR = 0;
  }
. . .
}

This happens on custom project board, when everything usually works fine for weeks, but sometimes it just stopped all ethernet activity. I was unable to find a real source of trouble and decided to upgrade HAL Ethernet driver to the latest version from these STM github.com sources. One line of code I had to add: in low_level_output, right before pbuf_ref(p) I have to call SCB_CleanInvalidateDCache(). Otherwise there was no ethernet activity. With these new drivers my project is running as it was before, with the old drivers.

After I did make sure that all this works I begin a stress-test. Board has some background activity (Wireguard link with keep-alive data, https (MBedTLS) client connection with data store to the central database and Chrome browser tab reloading JSON data from MCU once in 5 seconds). Additional load is:

With such load (not a big one) MCU works for about 5 minutes as expected. Wireshark shows no problems or missed packets at all. But suddenly all transmission stops. See attaches screenshot:

capture

192.168.2.1 is my notebook 192.168.2.9 is a controller Last packet from MCU is #24948 Capture file: Failure at packet 24948.pcapng.zip

I have found similar problems on ST forum (https://community.st.com/s/question/0D53W00001PyCiISAV/ethdmacsrrbu-error-occurs-and-stalls-the-ethernet-receive-on-stm32h7is-there-a-way-around-this-issue-with-the-dma?t=1655188070920 and https://community.st.com/s/question/0D53W00001EjVIqSAN/im-working-on-h743-sometimes-i-find-ethdmacsrrbu-errorhow-to-solve-ethdmacsrrbu-issue) but there is no solution.

Some people says that heth->Instance->DMACSR = ETH_DMACSR_RBU; and heth->Instance->DMACRDTPR = 0; helps them, but in my case it doesn't.

Is it possible to restart DMA, even loosing current data referenced in descriptors?

ktrofimov commented 2 years ago

Not recovering form DMA RBU error under debug could be a separate issue: https://lists.gnu.org/archive/html/lwip-users/2012-09/msg00045.html Solution provided (for the old STM32F4 driver) has a continued read in cycle while(p=low_level_input()) (same as in the new driver) and the same DMA reset code ETH->DMASR = ETH_DMASR_RBUS; and ETH->DMARPDR = 0;:

/**
 * Should allocate a pbuf and transfer the bytes of the incoming
 * packet from the interface into the pbuf.
 *
 * @param netif the lwip network interface structure for this ethernetif
 * @return a pbuf filled with the received packet (including MAC header)
 *         NULL on memory error
 */
static struct pbuf * low_level_input(struct netif *netif)
{
  struct pbuf *p, *q;
  u16_t len;
  uint32_t l=0,i =0;
  FrameTypeDef frame;
  u8 *buffer;
  __IO ETH_DMADESCTypeDef *DMARxNextDesc;

  p = NULL;

  /* Get received frame */
  frame = ETH_Get_Received_Frame_interrupt();

  if (frame.descriptor && frame.buffer) {
      /* check that frame has no error */
      if ((frame.descriptor->Status & ETH_DMARxDesc_ES) == (uint32_t)RESET)
      {

        /* Obtain the size of the packet and put it into the "len" variable. */
        len = frame.length;
        buffer = (u8 *)frame.buffer;

        /* We allocate a pbuf chain of pbufs from the pool. */
        p = pbuf_alloc(PBUF_RAW, len, PBUF_POOL);

        /* Copy received frame from ethernet driver buffer to stack buffer */
        if (p != NULL)
        {
          for (q = p; q != NULL; q = q->next)
          {
            memcpy((u8_t*)q->payload, (u8_t*)&buffer[l], q->len);
            l = l + q->len;
          }
        }
      }

      /* Release descriptors to DMA */
      /* Check if received frame with multiple DMA buffer segments */
      if (DMA_RX_FRAME_infos->Seg_Count > 1)
      {
        DMARxNextDesc = DMA_RX_FRAME_infos->FS_Rx_Desc;
      }
      else
      {
        DMARxNextDesc = frame.descriptor;
      }

      /* Set Own bit in Rx descriptors: gives the buffers back to DMA */
      for (i=0; i<DMA_RX_FRAME_infos->Seg_Count; i++)
      { 
        DMARxNextDesc->Status = ETH_DMARxDesc_OWN;
        DMARxNextDesc = (ETH_DMADESCTypeDef *)(DMARxNextDesc->Buffer2NextDescAddr);
      }

      /* Clear Segment_Count */
      DMA_RX_FRAME_infos->Seg_Count =0;
  }
  return p;
}

static void ethernet_watchdog(void) {
    /* When Rx Buffer unavailable flag is set: clear it and resume reception */
    if ((ETH->DMASR & ETH_DMASR_RBUS) != (u32)RESET) 
    {
        /* Clear RBUS ETHERNET DMA flag */
        ETH->DMASR = ETH_DMASR_RBUS;  

        /* Resume DMA reception. The register doesn't care what you write to it. */
        ETH->DMARPDR = 0;
    }
}

void ethernetif_input( void * pvParameters )
{
  struct pbuf *p;

  for( ;; )
  {
    if (xSemaphoreTake( s_xSemaphore, emacBLOCK_TIME_WAITING_FOR_INPUT)==pdTRUE)
    {
        while ((p = low_level_input( s_pxNetIf )) != 0) {
          if (p != 0) {
              if (ERR_OK != s_pxNetIf->input( p, s_pxNetIf))
              {
                pbuf_free(p);
                p=NULL;
              }
          }
        }
    }
   ethernet_watchdog();
  }
} 
ktrofimov commented 2 years ago

Update 2: Compared ping reply time with old driver (0.5 ms) and new one (1..3 ms). Reason was in debug printf in lwIP (ICMP and ETHARP). Disabling debug improved ping reply as it was with an old driver (down to 0.5 ms).

ktrofimov commented 2 years ago

Update 3: Found one my mistake in lwipopts.h: LWIP_RAM_HEAP_POINTER was not set to .LwIP_HeapSection. Now, despite Ethernet still fails after a while:

64 bytes from 192.168.2.9: icmp_seq=3005 ttl=255 time=0.401 ms
64 bytes from 192.168.2.9: icmp_seq=3006 ttl=255 time=0.499 ms
64 bytes from 192.168.2.9: icmp_seq=3007 ttl=255 time=0.370 ms
64 bytes from 192.168.2.9: icmp_seq=3008 ttl=255 time=0.409 ms
64 bytes from 192.168.2.9: icmp_seq=3009 ttl=255 time=100.731 ms
Request timeout for icmp_seq 3010
Request timeout for icmp_seq 3011
Request timeout for icmp_seq 3012
Request timeout for icmp_seq 3013

but now - not completely, periodically some ICMP replies (as well as other packets like HTTP) are getting through:

Request timeout for icmp_seq 271
Request timeout for icmp_seq 272
Request timeout for icmp_seq 273
64 bytes from 192.168.2.9: icmp_seq=256 ttl=255 time=18728.199 ms
64 bytes from 192.168.2.9: icmp_seq=257 ttl=255 time=17822.948 ms
64 bytes from 192.168.2.9: icmp_seq=265 ttl=255 time=9901.259 ms
Request timeout for icmp_seq 277
Request timeout for icmp_seq 278
Request timeout for icmp_seq 279
Request timeout for icmp_seq 280
Request timeout for icmp_seq 281
Request timeout for icmp_seq 282
Request timeout for icmp_seq 283
64 bytes from 192.168.2.9: icmp_seq=275 ttl=255 time=9769.960 ms
Request timeout for icmp_seq 285
Request timeout for icmp_seq 286
Request timeout for icmp_seq 287
ktrofimov commented 2 years ago

Update 4: The code

// Clear RBUS ETHERNET DMA flag
heth->Instance->DMACSR = ETH_DMACSR_RBU;
// Resume DMA reception. The register doesn’t care what you write to it.
heth->Instance->DMACRDTPR = 0;

should NOT be called from within interrupt (HAL_ETH_ErrorCallback IS called within interrupt). Moving this code to ethernetif_input enables normal debug as it supposed to be.

ktrofimov commented 2 years ago

Update 5: Driver is working. Closing issue.

dima-kapustin commented 2 years ago

Hi Kirill! Could you please share all changes/tweaks you made?

ktrofimov commented 2 years ago

Summary: Removed all my extra code from HAL_ETH_ErrorCallback, leave it ast it is in driver:

void HAL_ETH_ErrorCallback(ETH_HandleTypeDef *heth)
{
  if((HAL_ETH_GetDMAError(heth) & ETH_DMACSR_RBU) == ETH_DMACSR_RBU)
  {
    osSemaphoreRelease(RxPktSemaphore);
  }
  if((HAL_ETH_GetDMAError(heth) & ETH_DMACSR_TBU) == ETH_DMACSR_TBU)
  {
    osSemaphoreRelease(TxPktSemaphore);
  }
}

add DMA error watchdog function:

static void ethernet_watchdog(void) {
    if ((ETH->DMACSR & ETH_DMACSR_RBU) == ETH_DMACSR_RBU ) 
    {
        // Clear RBUS ETHERNET DMA flag
        ETH->DMACSR = ETH_DMACSR_RBU;  
        // Resume DMA reception
        ETH->DMACRDTPR = 0;
    }
    if ((ETH->DMACSR & ETH_DMACSR_TBU) == ETH_DMACSR_TBU ) 
    {
        // Clear RBUS ETHERNET DMA flag
        ETH->DMACSR = ETH_DMACSR_TBU;  
        // Resume DMA reception
        ETH->DMACTDTPR = 0;
    }
}

Run this watchdog from ethernet_input();:

void ethernetif_input( void * pvParameters )
{
  struct pbuf *p;

  for( ;; )
  {
    . . .
   ethernet_watchdog();
  }
} 
dima-kapustin commented 2 years ago

Thanks a lot!

ASELSTM commented 2 years ago

Hi @ktrofimov,

Great that you were able to solve the problem. Please allow me then to close this thread.

With regards,