STM32F411: Sending corrupted data

yaqwsx commented 3 years ago

Hello, I am trying to use your awesome library on STM32F411. I use the latest master and driver usbd_stm32f429_otgfs.c. I have ported the basic CDC loopback demo to my application using interrupts (basically copy-paste). I also use ST HAL, so I created the following main loop:

int i = 0;
while ( true ) {
    i++;
    HAL_Delay( 500 );
    char buff[ 64 ];
    snprintf( buff, 64, "%d: Hello world!\n", i );
    int written = usbd_ep_write( &udev, CDC_TXD_EP, buff, strlen( buff ) );
    Dbg::trace( "Written %d, %s", written, buff );
}

With this code, I observe a strange behavior regarding the data received on the USB - basically, in the first 10 iterations, there are sometimes sent extra 4 characters from the previous message, however, the written variable does not include these characters nor they are present in the source buffer (verified via tracing to UART).

When I connect to the device, I get the following output on dev/ttyACM0:

1: Hello world!
2: Hello world!
2: H3: Hello world!
3: H4: Hello world!
5: Hello world!
5: H6: Hello world!
7: Hello world!
7: H8: Hello world!
9: Hello world!
10: Hello world!
11: Hello world!
12: Hello world!

And this is the output on my debug UART:

Written 16, 1: Hello world!

Written 16, 2: Hello world!

Written 16, 3: Hello world!

Written 16, 4: Hello world!

Written 16, 5: Hello world!

Written 16, 6: Hello world!

Written 16, 7: Hello world!

Written 16, 8: Hello world!

Written 16, 9: Hello world!

Written 17, 10: Hello world!

Written 17, 11: Hello world!

Written 17, 12: Hello world!

Do I use the usbd_ep_write function correctly? Is it possible there is a bug related to the double-buffer functionality on STM32F411? Could you point me to what should I examine in order to debug this issue?

dmitrystu commented 3 years ago

Hi. What's happen if you try to use a single buffered endpoint? What's is on the URB level (using wireshark or usblyzer)?

dmitrystu commented 3 years ago

I will try to reproduce this (non (event/interrupt)- based endpoint write flow) for the OTGFS based device.

yaqwsx commented 3 years ago

I captured the traffic with Wireshark, I attach the log. The STM32 address is 1.116 in the log and I can see that the data length is 20 in the corrupted packets. The problem happens no matter if I use flag USB_EPTYPE_DBLBUF.

capture.zip

yaqwsx commented 3 years ago

I am not sure if it is related, but I also observe that randomly the GET_DESCRIPTOR response fails due to corrupted data. The first few bytes in the response seem to come from a previous message.

dmitrystu commented 3 years ago

I am not sure if it is related, but I also observe that randomly the GET_DESCRIPTOR response fails due to corrupted data. The first few bytes in the response seem to come from a previous message.

Looks like this is a problem with hardware FIFO. For some reason, the first word (FIFO is 32-bit wide) is not pulled out completely from FIFO sometimes. I have no F411 in my lab and tried to reproduce this on F405 and F429. See no problems. I will try to find F411.

yaqwsx commented 3 years ago

I just noticed there are some errata for F411 regarding USB_OTG_FS that are not present on F405: https://www.st.com/resource/en/errata_sheet/dm00137034-stm32f411xc-and-stm32f411xe-device-limitations-stmicroelectronics.pdf

However, my knowledge of the driver implementations is currently rather poor to judge if they can cause this issue.

dmitrystu commented 3 years ago

Got an STM32F411E-DISCO. Will check this problem ASAP.

dmitrystu commented 3 years ago

Tried with basic demo code. See no problems with corrupted FIFO for both poll-based and interrupt-based builds. Board: DK32F411E$AU1 MCU:STM32F411VET6U Rev.1

yaqwsx commented 3 years ago

Thank you for your time and sharing. Did you also try an example similar to mine - i.e., sending data independently on reception? If yes, could you share the exact code? I would try it on my hardware - that could help me to track down my problem.

dmitrystu commented 3 years ago

Here. It's dirty a bit, but uses the same TX flow.

dmitrystu commented 3 years ago

You probably start writing to the unconfigured FIFO. This causes unpredictable behavior.

yaqwsx commented 3 years ago

Thank you for sharing your code and your overall patience. I really appreciate it! It probably let me to solving the problem; however, I will have to test it probably in the next few days.

Here's what I noticed and I did (for future reference if anyone stumbles upon the same problem). I am using HSE on my board, while your demo is using HSI. My code for clock configuration is this one:

LL_FLASH_SetLatency( LL_FLASH_LATENCY_0 );
while( LL_FLASH_GetLatency() != LL_FLASH_LATENCY_0 );
LL_PWR_SetRegulVoltageScaling( LL_PWR_REGU_VOLTAGE_SCALE1 );
LL_RCC_HSE_Enable();

// Wait till HSE is ready
while( LL_RCC_HSE_IsReady() != 1 );

// Wait till HSI is ready
while( LL_RCC_HSI_IsReady() != 1 );
LL_RCC_PLL_ConfigDomain_48M( LL_RCC_PLLSOURCE_HSE, LL_RCC_PLLM_DIV_12, 72, LL_RCC_PLLQ_DIV_3 );
LL_RCC_PLL_Enable();

// Wait till PLL is ready
while( LL_RCC_PLL_IsReady() != 1 );
LL_RCC_SetAHBPrescaler( LL_RCC_SYSCLK_DIV_1 );
LL_RCC_SetAPB1Prescaler( LL_RCC_APB1_DIV_1 );
LL_RCC_SetAPB2Prescaler( LL_RCC_APB2_DIV_1 );
LL_RCC_SetSysClkSource( LL_RCC_SYS_CLKSOURCE_HSE );

// Wait till System clock is ready
while( LL_RCC_GetSysClkSource() != LL_RCC_SYS_CLKSOURCE_STATUS_HSE );
LL_SetSystemCoreClock( 24000000 );

// Update the time base
if ( HAL_InitTick( TICK_INT_PRIORITY ) != HAL_OK ) {
    assert( false && "Incorrect tick configuration " );
}
LL_RCC_SetTIMPrescaler( LL_RCC_TIM_PRESCALER_TWICE );

With this clock configuration, I observe corrupted FIFO. If I use your HSI clock configuration, my code starts working. I currently changed the core clock to 96 MHz and it seems to solve the problem. However, I still don't fully understand what exactly is wrong with my old clock configuration.

dmitrystu commented 3 years ago

What was the expected PLL VCO frequency with your XTAL? According to section 6.3.2 of the datasheet, it must be between 100 and 432 MHz. Otherwise, you will have incorrect, unstable, and jittered clocks for all peripherals and core. AFAIK Fvco = Fxtal * PLLN / PLLM

dmitrystu commented 3 years ago

As I see, You used 24Mhz HSE with M=12 N=72 Q=3 to get a 48Mhz USB clock and 24Mhz HSE as AHB. Fvco is about 144Mhz and fits the requirements. Perhaps you need to adjust TRDT value in the OTG_FS_GUSBCFG register.

yaqwsx commented 2 years ago

For anyone finding this issue in the future: I wasn't successful with adjusting TRDT, however, it seems more like a problem of the silicon. I needed to move with the development, so I just settled with a clock configuration that worked.

Thank you again, @dmitrystu for your help!

GrantMTG commented 2 years ago

One thing I have found, that I have not had time to investigate yet (and this is F103), but with polling mode my USB device fails the USB Check compliance suite. With interrupt mode it passes. So maybe look for something unrelated such as that.

I do have a Black Pill board, so at some point I will port my app to that and can test your issue then.

dmitrystu commented 2 years ago

Which one test from the suite failed? I will try with USB3CV v2.2.2.0

GrantMTG commented 2 years ago

I could not find the report, so I will keep looking or try and recreate it.

To create the fault, perform the Ch. 9 Tests. This is the general USB framework test for any device. It works all the way through in IRQ mode, but I seem to recall it failed at Set Configuration in polled mode. Beware that if your descriptor calls for Remote Wakeup, then your firmware needs to support it.

dmitrystu / libusb_stm32

STM32F411: Sending corrupted data #101