STMicroelectronics / stm32h7xx_hal_driver

Provides the STM32Cube MCU Component "hal_driver" of the STM32H7 series.
BSD 3-Clause "New" or "Revised" License
86 stars 36 forks source link

Upgrading the hal library from 1.11.0 to 1.11.1 will cause the SPI master transmission to time out for no reason. #47

Closed huoxingdawang closed 2 months ago

huoxingdawang commented 4 months ago

In our project, we need to read a W25Q64 flash using SPI bus. Our code used to work very well, until recently my colleague reported that there were many problems with the flash operation. After investigation, we found that HAL_SPI_Transmit and HAL_SPI_Receive often reported HAL_TIMEOUT for no reason. If we downgrade our hal_spi.c and hal_spi.h to v1.11.0, there will be no problem. If we replace these two files with v1.11.1 and v1.11.2, there will be a timeout problem.

We noticed that the SPI of v1.11.0 is very different from the SPI of v1.11.1. We currently don’t know what specifically caused this phenomenon. Our team has spent two weeks troubleshooting this strange problem, which has had a great impact on our development progress.

huoxingdawang commented 4 months ago

In more detail, our code first need to write num bytes from val to spi, and then read num2 bytes from spi to val2. Our latest debugging results show that there will be problems if we use the following code.

HAL_SPI_Transmit(&hspi1,val,num,2000);
HAL_SPI_Receive(&hspi1,val2,num2,2000);

And the following way it won't.

HAL_SPI_Transmit(&hspi1,val,num,2000);
HAL_SPI_TransmitReceive(&hspi1,val2,val2,num2,2000);

We noticed that in the 1.11.0 version of the hal library, under our configuration (full duplex master) HAL_SPI_Receive was actually calling HAL_SPI_TransmitReceive. https://github.com/STMicroelectronics/stm32h7xx_hal_driver/blob/8e7eb8dba8bee5e4c260c6fc724437502e98fc76/Src/stm32h7xx_hal_spi.c#L1013C1-L1030C1

  if ((hspi->Init.Mode == SPI_MODE_MASTER) && (hspi->Init.Direction == SPI_DIRECTION_2LINES))
  {
    hspi->State = HAL_SPI_STATE_BUSY_RX;
    /* Call transmit-receive function to send Dummy data on Tx line and generate clock on CLK line */
    return HAL_SPI_TransmitReceive(hspi, pData, pData, Size, Timeout);
  }

While in 1.11.1 this implementation was removed. We'd like to know what the motivation was for removing this and how to fix it, like a PR or something.

huoxingdawang commented 4 months ago

To add, we have tried modifying the compiler's optimization level and reducing the SPI clock frequency, and we found that this does not affect this problem.

KRASTM commented 4 months ago

Hello @huoxingdawang,

Thank you for your report.

In fact, there was an update on the driver in the version V1.11.1 following a new Hardware implementation. So, under the config (full duplex Master) only the RX Line is active for HAL_SPI_Receiv() function, if you want to use the TX Line during Receiving, you must use HAL_SPI_TransmitReceive() function.

I hope the answer will be helpful.

With regards

huoxingdawang commented 4 months ago

Hello @huoxingdawang,

Thank you for your report.

In fact, there was an update on the driver in the version V1.11.1 following a new Hardware implementation. So, under the config (full duplex Master) only the RX Line is active for HAL_SPI_Receiv() function, if you want to use the TX Line during Receiving, you must use HAL_SPI_TransmitReceive() function.

I hope the answer will be helpful.

With regards

nonononononono, I only want to use RX line, I don't need to transmit any any data to the device during receving. But it is the hal lib always TIMEOUT when I am using HAL_SPI_Receive, which forces me to use HAL_SPI_TransmitReceive instead.

KRASTM commented 4 months ago

Hello @huoxingdawang,

Can you share more details (for ex: Screenshots) about your Config & project, so we can reproduce it. I tested the HAL_SPI_Receive() in a project (different from yours) and I did not get any kind of error (especially a timeout).

With regards.

huoxingdawang commented 4 months ago

image image image No user constants, NVIC and DMA for spi1 is configured. Both ICache and DCache are enabled, compile using AC6 in MDK, -ofast and link time optimization was enabled. The core of the code is something like the following:

#define SPICS(x)                do{SPI1_CS1_GPIO_Port->BSRR=(x)?(SPI1_CS1_Pin):(SPI1_CS1_Pin<<16);}while(0)
void spiRead(uint32 addr,uint8*data,uint32 size)
{
    const uint8 cmd[]={
        0x03,
        (uint8)(addr>>16),
        (uint8)(addr>>8),
        (uint8)(addr),
    };
    SPICS(0);
    HAL_SPI_Transmit(&hspi1,cmd,sizeof(cmd),2000);
    HAL_SPI_Receive(&hspi1,data,size,2000);
    SPICS(1);
}

We also noticed that sometimes a timeout is returned, and sometimes no timeout is returned but the data is wrong. By using HAL_SPI_TransmitReceive instead of HAL_SPI_Receive there will be no problems, nor will rolling back to 1.11.0.

KRASTM commented 3 months ago

Hello @huoxingdawang,

Can you try enabling the parameter "Master Keep Io State" in the SPI Configuration?

With regards.

huoxingdawang commented 3 months ago

Hello @KRASTM, Thank you for your reply. Unfortunately, this doesn't seem to work. image Besides, we found out earlier today that it seemed to have something to do with the interrupt. If we turn off interrupts in certain steps, the bug seems to change. We are conducting further testing. With regards.

huoxingdawang commented 3 months ago

We guess it is because an interrupt is entered during the receive process, but SPI does not pause in receive-only mode, which results in data loss, making while (hspi->RxXferCount > 0UL) never exit, resulting in a timeout.

We have also tested Master Receive auto susp before. According to our understanding, this will block SPI rx when the rxFIFO is full, but for some reason, turning this option on or off does not seem to have any impact on us.

This seems to explain why it is normal to use HAL_SPI_TransmitReceive, because after entering the interrupt, the CPU will not send data to the SPI. If the SPI does not have new data, it will naturally stop TX and RX, without causing data loss.

TOUNSTM commented 2 months ago

Fixed in db7e4a987cc58023a53a255f8617f6e3fb76731b