espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.44k stars 7.25k forks source link

Multi SPI configuration causes occasional corrupted data transfer (IDFGH-11187) #12354

Open lilalaunestift opened 12 months ago

lilalaunestift commented 12 months ago

Answers checklist.

IDF version.

v5.1.1-1-gd3c99ed3b8

Espressif SoC revision.

ESP32-D0WD-V3 (revision v3.0)

Operating System used.

Linux

How did you build your project?

VS Code IDE

If you are using Windows, please specify command line type.

None

Development Kit.

Custom Board

Power Supply used.

External 3.3V

What is the expected behavior?

Two SPI buses are used:

  1. The HSPI is configured as Master. A dm9051 controller is the slave. The driver for the dm9051 from the esp-idf is used.
  2. The VSPI is configured as Slave and communicates with another controller. Reliable data transfer on both configured SPI busses is expected.

What is the actual behavior?

The communication of HSPI master is unstable. Approx. 10% of the messages are corrupted somehow. This can be observed for both incoming and outgoing data: For incoming data over the MISO line, it can be observed that data on the SPI bus sent by dm9051 is correct (via Logic Analyzer), but partly faulty data can be found in the receive buffer. For outgoing data (MOSI), there is correct data in the send buffer, but partly faulty data can be observed on the SPI bus.

Steps to reproduce.

  1. Step Configuration of the HSPI Master:
#define IN_MODULE_nINT_PIN          GPIO_NUM_4
#define SPI_MODULE_MISO_PIN         GPIO_NUM_12
#define SPI_MODULE_MOSI_PIN         GPIO_NUM_13
#define SPI_MODULE_CLK_PIN          GPIO_NUM_14
#define SPI_MODULE_nCS_PIN          GPIO_NUM_15

#define ETHERNET_SPI_HOST           HSPI_HOST
#define ETHERNET_SPI_CLK            5000000
#define ETHERNET_DMA_CHAN           1

esp_err_t Ethernet_init(void)
{
    esp_err_t esp_err;
    esp_eth_mac_t *poEthMac = NULL;
    esp_eth_phy_t *poEthPhy = NULL;

    esp_err = Ethernet_initSpi();
    esp_err = Ethernet_initMacPhyController(&poEthMac, &poEthPhy)
}

esp_err_t Ethernet_initSpi(void)
{
    oEthernet.hSpiHandle = NULL;

    spi_bus_config_t oBusConfig = {
            .miso_io_num = SPI_MODULE_MISO_PIN,
            .mosi_io_num = SPI_MODULE_MOSI_PIN,
            .sclk_io_num = SPI_MODULE_CLK_PIN,
            .quadwp_io_num = -1,
            .quadhd_io_num = -1,
    };
    ESP_ERROR_CHECK(spi_bus_initialize(ETHERNET_SPI_HOST, &oBusConfig, ETHERNET_DMA_CHAN));
    return ESP_OK;
}

esp_err_t Ethernet_initMacPhyController(esp_eth_mac_t **poOutMac ,esp_eth_phy_t **poOutPhy)
{
    eth_mac_config_t oMacConfig = ETH_MAC_DEFAULT_CONFIG();
    eth_phy_config_t oPhyConfig = ETH_PHY_DEFAULT_CONFIG();
    oPhyConfig.autonego_timeout_ms = 0;
    oPhyConfig.phy_addr = 1;
    oPhyConfig.reset_gpio_num = -1;

    spi_device_interface_config_t oSpiDevConfig = {
            .command_bits = 1,
            .address_bits = 7,
            .mode = 0,
            .clock_speed_hz = ETHERNET_SPI_CLK,
            .spics_io_num = SPI_MODULE_nCS_PIN,
            .queue_size = 20
    };

    eth_dm9051_config_t oDm9051Config = ETH_DM9051_DEFAULT_CONFIG(ETHERNET_SPI_HOST, &oSpiDevConfig);
    oDm9051Config.int_gpio_num = IN_MODULE_nINT_PIN;

    *poOutMac = esp_eth_mac_new_dm9051(&oDm9051Config, &oMacConfig);
    *poOutPhy = esp_eth_phy_new_dm9051(&oPhyConfig);

    return ESP_OK;
}
  1. Step Configuration of the SPI Slave:
    
    #define OUT_MSP_nINT_PIN            GPIO_NUM_16
    #define OUT_MSP_nRDY_PIN            GPIO_NUM_32
    #define SPI_MSP_MISO_PIN            GPIO_NUM_19
    #define SPI_MSP_MOSI_PIN            GPIO_NUM_23
    #define SPI_MSP_CLK_PIN             GPIO_NUM_18
    #define SPI_MSP_nCS_PIN             GPIO_NUM_5

define MSP_HOST VSPI_HOST

define MSP_DMA_CHAN 2

esp_err_t Slave_init(uint32_t nMaxLen) { esp_err_t esp_err;

// Configuration for the SPI bus
spi_bus_config_t buscfg =
{
    .mosi_io_num        = SPI_MSP_MOSI_PIN,
    .miso_io_num        = SPI_MSP_MISO_PIN,
    .sclk_io_num        = SPI_MSP_CLK_PIN,
    .quadwp_io_num      = -1,
    .quadhd_io_num      = -1,
    .max_transfer_sz    = nMaxLen,
    .flags              = SPICOMMON_BUSFLAG_SLAVE,
    .intr_flags         = ESP_INTR_FLAG_LOWMED
};

// Configuration for the SPI slave interface
spi_slave_interface_config_t slvcfg =
{
    .spics_io_num   = SPI_MSP_nCS_PIN,
    .flags          = 0,
    .queue_size     = 2,    // at least 2
    .mode           = 1,
    .post_setup_cb  = &Msp_postSetupCb,
    .post_trans_cb  = &Msp_postTransCb
};

//Initialize SPI slave interface
esp_err = spi_slave_initialize(MSP_HOST, &buscfg, &slvcfg, MSP_DMA_CHAN);

return esp_err;

}


sdkconfig file:
[sdkconfig.txt](https://github.com/espressif/esp-idf/files/12802931/sdkconfig.txt)

### Debug Logs.

```plain
Here is an example of the observed data corruption:

Data in the buffer as passed to the dm9051 in (emac_dm9051_transmit()):

28 6b 35 b2 71 f9 a8 03 2a ee c4 67 08 00 45 00
00 54 9e ba 40 00 ff 01 f7 66 c0 a8 b2 1b c0 a8
b2 1a 00 00 00 c7 00 01 2f 5c 59 65 15 65 00 00
00 00 9b 3e 07 00 00 00 00 00 10 11 12 13 14 15
16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25
26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35
36 37

Data observed on the SPI bus via logic analyzer:

28 6B 35 B2 71 F9 A8 03 2A EE C4 67 08 00 45 00 
00 54 9E BA 40 00 FF 01 F7 66 C0 A8 B2 1B C0 A8 
B2 1A 00 00 00 C7 00 01 2F 5C 59 65 15 65 00 00 
00 00 9B 00 00 9B 3E 07 00 00 00 00 00 10 11 12 
13 00 00 00 00 9B 3E 07 00 00 00 00 00 10 11 12 
13 00 00 00 00 9B 3E 07 00 00 00 00 00 10 11 12 
13 00

More Information.

Is it possible that there is an issue on balancing the DMA usage? It seems that somehow the data is corrupted between the SPI bus and dm9051 send/receive buffer. At the same time I would not suspect a SPI issue, since the data is correct in most parts and the faulty parts are not arbitrary data (see log above).

kostaond commented 11 months ago

@lilalaunestift thanks for nice detailed report. Could you please try to lower SPI CLK for DM9051 to 20 MHz?

Edit: I see, it's actually 5 MHz... What is SPI CLK of slave?

lilalaunestift commented 11 months ago

Hi, the clock of the slave SPI is running with 4.096MHz.

kostaond commented 11 months ago

One more question...

I moved the shown initialization above to the ethernet example and it is still working fine

Do you mean just solely Ethernet or Ethernet and the SPI slave?

lilalaunestift commented 11 months ago

I moved only the Ethernet to the dm9051 example.

lilalaunestift commented 11 months ago

Hey, I did some more measurements with the logic analyzer. I captured both SPI buses this time. The following two pictures show the two constellations I found, where the SPI Master is transmitting wrong data on the bus. The point in time where the corruption starts is shortly after the SPI Slave is done with receiving (the red and purple markers placed in the capture show where the corruption approximately starts). Again I can observe, that some part of the data is sent repeadetly.

BrokenFrame BrokenFrame2

On the other hand, I found transmissions where also both buses are active, but there is no corruption on the SPI master:

OkFrame

Here it seems that the SPI slave is considerate of the SPI master.

Hope this helps somehow. Greetings

kostaond commented 11 months ago

@lilalaunestift thank you for the report, we try to reproduce and let you know.

lilalaunestift commented 11 months ago

Hey @kostaond, are there any news on this issue so far? Are you able to reproduce this behavior? Greetings

kostaond commented 11 months ago

Hi @lilalaunestift, my colleague tries to reproduce it. However, he hasn't be able to reproduce yet...

kostaond commented 11 months ago

@lilalaunestift how long are expected transactions at SPI slave? If they are less than or equal to 32B, could you please try to disable DMA at the slave interface?

kostaond commented 11 months ago

We haven't been able to reproduce. We've based our setup on SPI Slave example. We have two ESP32's - one as master and one slave with DM9051. Could you please provide more information about your SPI slave code?

KaeLL commented 11 months ago

@kostaond

If they are less than or equal to 32B, could you please try to disable DMA at the slave interface?

Care to elaborate on that?

kostaond commented 11 months ago

Care to elaborate on that?

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/peripherals/spi_slave.html?highlight=spi_slave_transmit#driver-usage

lilalaunestift commented 11 months ago

@lilalaunestift how long are expected transactions at SPI slave? If they are less than or equal to 32B, could you please try to disable DMA at the slave interface?

The transactions are longer than 32B. Disabling the DMA is unfortunately not an option here.

We haven't been able to reproduce. We've based our setup on SPI Slave example. We have two ESP32's - one as master and one slave with DM9051. Could you please provide more information about your SPI slave code?

Just for clarification: are you also using a setup where the ESP32 is master and slave at the same time (see picture)? 2023-11-03 07_49_17-Window

kostaond commented 11 months ago

Just for clarification: are you also using a setup where the ESP32 is master and slave at the same time (see picture)?

Yes, we added the DM9051 to the SPI Slave example.

Could you please provide more details about your setup? For example, slave code, what is the traffic (size, period), etc.

lilalaunestift commented 10 months ago

Hey, sorry for the delay.

Regarding our slave code: 99.9% of the data has a length of 201B and is transmitted periodically every 4ms from master to slave. Clock frequency of the slave is 4,096MHz. Besides the four SPI lines, there are two additional lines for the communication. Both of them are controlled by the ESP:

  1. MSP_READY: this line signals SPI Master that the ESP is ready for a SPI transaction
  2. MSP_INT: this line signals the SPI Master that the ESP wants to transmit something

Sending and receiving are done sequentially. So while the ESP slave is sending, the master is only receiving and vice versa. Here is the part where the slave code is interacting with the SPI driver.


#include "../inc/Clock.h"
#include "../inc/Crc16.h"

#include "driver/gpio.h"
#include "driver/spi_slave.h"
#include "esp_intr_alloc.h"

static void Msp_postSetupCb(spi_slave_transaction_t* pTrans);
static void Msp_postTransCb(spi_slave_transaction_t* pTrans);

uint8_t acSnd[268];

typedef struct SMsp
{
    spi_slave_transaction_t t0;
    uint32_t nCpuTick;
}
Msp_t;

Msp_t oMsp;

esp_err_t Msp_init(uint32_t nMaxLen)
{
    esp_err_t esp_err;

    memset(&acSnd, 0xff, sizeof(acSnd));
    memset(&oMsp, 0x00, sizeof(Msp_t));

    // Configuration for the SPI bus
    spi_bus_config_t buscfg =
    {
        .mosi_io_num        = SPI_MSP_MOSI_PIN,
        .miso_io_num        = SPI_MSP_MISO_PIN,
        .sclk_io_num        = SPI_MSP_CLK_PIN,
        .quadwp_io_num      = -1,
        .quadhd_io_num      = -1,
        .max_transfer_sz    = nMaxLen,
        .flags              = SPICOMMON_BUSFLAG_SLAVE,
        .intr_flags         = ESP_INTR_FLAG_LOWMED
    };

    // Configuration for the SPI slave interface
    spi_slave_interface_config_t slvcfg =
    {
        .spics_io_num   = SPI_MSP_nCS_PIN,
        .flags          = 0,
        .queue_size     = 2,    // at least 2
        .mode           = 1,
        .post_setup_cb  = &Msp_postSetupCb,
        .post_trans_cb  = &Msp_postTransCb
    };

    //Initialize SPI slave interface
    esp_err = spi_slave_initialize(MSP_HOST, &buscfg, &slvcfg, MSP_DMA_CHAN);
    assert(esp_err == ESP_OK);

     return esp_err;
}

void Msp_transReady(void)
{
    // ready to transmit
    gpio_set_level(OUT_MSP_nINT_PIN, 0);
}

void Msp_writeBlock(uint8_t* acSndData, uint8_t* acRcvData, uint32_t nMaxLen)
{
    esp_err_t esp_err;

    oMsp.t0.length      = nMaxLen << 3;
    oMsp.t0.rx_buffer   = acRcvData;
    oMsp.t0.trans_len   = 0;
    oMsp.t0.tx_buffer   = acSndData;
    oMsp.t0.user        = (void*)1;

    esp_err = spi_slave_queue_trans(MSP_HOST, &oMsp.t0, 0);
    assert(esp_err == ESP_OK);
}

void Msp_readBlock(uint8_t* acRcvData, uint32_t nMaxLen)
{
    esp_err_t esp_err;

    oMsp.t0.length      = nMaxLen << 3;
    oMsp.t0.rx_buffer   = acRcvData;
    oMsp.t0.trans_len   = 0;
    oMsp.t0.tx_buffer   = acSnd;
    oMsp.t0.user        = (void*)0;

    esp_err = spi_slave_queue_trans(MSP_HOST, &oMsp.t0, 0);
    assert(esp_err == ESP_OK);
}

void Msp_getTransResult(void)
{
    esp_err_t esp_err;
    spi_slave_transaction_t * pTrans = NULL;

    esp_err = spi_slave_get_trans_result(MSP_HOST, &pTrans, 0);
        esp_err = ESP_OK;
    if (esp_err != ESP_OK)
        return;

    if ((uint32_t)pTrans->user != 0)
    {
        // not ready to transmit
        gpio_set_level(OUT_MSP_nINT_PIN, 1);
    }
    if (pTrans->tx_buffer == acSnd)
        pTrans->tx_buffer = NULL;

    SioMsp_onEvTransComplete(   pTrans->tx_buffer,
                                pTrans->rx_buffer,
                                pTrans->trans_len   );
}

// called after a transaction is queued and ready for pickup by master.
static void IRAM_ATTR Msp_postSetupCb(spi_slave_transaction_t* pTrans)
{
    // wait 1�s if necessary! MSP must see this negative edge!
    while (Clock_getCpuTicks() - oMsp.nCpuTick < eTick_tm1us);

    gpio_set_level(OUT_MSP_nRDY_PIN, 0);
}

// called after transaction is sent/received.
static void IRAM_ATTR Msp_postTransCb(spi_slave_transaction_t* pTrans)
{
    BaseType_t xHigherPriorityTaskWoken;

    // not ready to receive or transmit
    gpio_set_level(OUT_MSP_nRDY_PIN, 1);

    oMsp.nCpuTick = Clock_getCpuTicks();

    // call spi_slave_get_trans_result ...
    xHigherPriorityTaskWoken = pdFALSE;

    SioMsp_onPostTransFromISR(&xHigherPriorityTaskWoken);

    if (xHigherPriorityTaskWoken == pdTRUE)
        portYIELD_FROM_ISR();
}

Thanks and Greetings

lilalaunestift commented 10 months ago

Hey @kostaond , did you find any useful information in the shared code that could help?

Assuming the problem is related to the DMA, is there anything I can do track the issue down or provide additional debug information? The documentation does not give much information about the topic. So I don't really know where to start. Greetings

kostaond commented 10 months ago

@lilalaunestift sorry for not replying, I was busy with other tasks. However, provided code still has room for uncertainty. We invested quite some time with the previous attempt using modified SPI Slave example. Therefore I would much appreciate, if you could provide fully functioning minimum project under which you are able to demonstrate the issue. We need to reproduce it at our side to move forward. I tried to discuss with team responsible for SPI and they indicated that the issue could be at HW design side (PCB)...

lilalaunestift commented 10 months ago

Ok, I will try to create a minimal project. I guess this will take some days till I find the time. I will let you know. Regarding the HW design: The reason why we so far have not investigated an issue on the pcb side is, that everything was working fine with idf4.3 and earlier versions. Is there any specific assumption what could cause the issue on the pcb? I could ask our HW guy to have a look at it then. Thanks and greetings

lilalaunestift commented 10 months ago

Hey, it took some time but I managed to created a minimal example for the esp32 with which I am able to reproduce the issue. SPI_Issue_min_example.zip Some explanations:

  1. The example can receive messages from an external SPI master device and does nothing with them. In my setup the attached SPI master transmits 204B of data every 4ms with a bus frequency of 4.096kHz.
  2. I made the espressif basic ethernet example part of the project (with some smaller changes ) and use it to drive the dm9051.

The mentioned additional pins for the SPI bus are not used in this minimum project (they are set to a fixed state and do not participate in the communication).

If I now use 'ping' to send ICMP packages to the esp32, roughly 7-10% of the messages are lost or damaged. When deactivating the Slave_task, 100% are received.

I'm still using the same setup regarding IDF and HW as mentioned in the beginning.

Greetings.

lilalaunestift commented 9 months ago

Hey, have you already found the time to take a look into the example? Greetings

kostaond commented 9 months ago

Hi @lilalaunestift, yes, we've give it a try but we have some troubles. I'll get back to you once there is something to share. Please be patient.

lilalaunestift commented 9 months ago

Ok, great. Thank you very much for the update.

lilalaunestift commented 8 months ago

Hi @kostaond, may i ask for a small update on this topic? Is it possible to reproduce the issue with the provided example? Greetings

kostaond commented 8 months ago

Hi @lilalaunestift, we had issues with SPI master... At the end, I needed to implement it on bear-metal SAM3S MCU to achieve 4 ms period. Therefore it took a time to find appropriate hardware, prepare all the infrastructure and the test setup. Anyway, I was able to reproduce the issue with minimum code example you provided.

The good thing is that I probably found the root cause of the issue. Your Rx buffer is not 32-bit aligned:

typedef struct SData
{
    uint8_t acData[258];  // !!!
}
Data_t;

The memory alignment is required by DMA engine otherwise the DMA may write incorrectly or not in a boundary aligned manner.. When I changed the Rx buffer size to 256B and transmitted the SPI message with the same size, there were no lost ping packets (I tried with ping 10.10.10.104 -i 0.5).

The problem is the driver didn't report error as it should have when incorrect aliment was used. I've already reported this issue to SPI colleagues.

KaeLL commented 8 months ago

@kostaond Do the restrictions described on the linked page also apply to spi_master?

kostaond commented 8 months ago

@kostaond Do the restrictions described on the linked page also apply to spi_master?

Very good question, they apply. I'm not sure if check is correctly implemented in code though. I asked SPI team to double check.

lilalaunestift commented 8 months ago

Hi @kostaond, thats good news! Thank you very much the effort! I will look into this topic tomorrow and then provide some feedback.

lilalaunestift commented 8 months ago

Hi @kostaond, I did some tests with your suggested change but it seems that the issue still persists.

The not word aligned buffer you mentioned is something I introduced while creating the minimal example. Sorry for that. In our actual code the Data struct is only part of the bigger struct _Framet which acts as receive buffer. But for simplification I removed the other part and only _Datat was left. Actually there are asserts that make sure the buffer is word aligned and has the correct length:

#pragma pack(push, 1)

typedef struct SFrame
{
    union
    {
        Data_t      Data;
        Packet_t    Packet;
    };
    uint16_t    nLen;
    struct
    {
        ESioAddr_t  eSioDstPortAddr;
        ESioAddr_t  eSioSrcPortAddr;
    };
}
Frame_t;

#pragma pack(pop)

// make sure that some properties hold:
_Static_assert(sizeof(Frame_t) == 268, "wrong Frame_t Size");
_Static_assert(sizeof(Frame_t) %  4 == 0, "Frame_t Array must be word aligned");
_Static_assert(sizeof(Data_t) == 258, "wrong Data_t Size");
_Static_assert(sizeof(Packet_t) < 258, "wrong Packet_t Size");
_Static_assert(OFFSET(Frame_t, Data) % 4 == 0, "wrong Data Offset");
_Static_assert(OFFSET(Frame_t, Packet) % 4 == 0, "wrong Packet Offset");

The actual call to MSP_readblock looks like this:

...
static void SioMsp_rcvBuffer(Frame_t* pRcvFrame)
{
    Msp_readBlock(&pRcvFrame->Data.acData[0], sizeof(Frame_t));
}
...

where sizeof(Frame_t) is applied as length to the _spi_slave_transactiont struct.

Anyways, I tested your suggested changes with the provided minimal example and I got the following results: receive buffer size lost ping packets
258B 7-10%
256B 7-10%
260B 7-10%
204B 1-2%
208B 7-10%

The data transmitted by our SPI master is (in 98% of the cases) 204B in length. If I make the receive buffer fit this length, I get a much better result (but still too many packets are lost). If I just increase the buffer by 4B to 208B, I'm back to the huge packet loss of more than 7%...

Can you confirm this behavior with your setup?

By the way, I did the tests with:

ping <ip> -c 100 -i 0.5

Greetings

kostaond commented 8 months ago

@lilalaunestift if I set acData buffer to size greater than actual transmitted data from master, I am able to reproduce the issue. In other words:

This could be your workaround. However, something is probably wrong somewhere. I'll pass it to SPI team. My work is done here since it is beyond my specialization... I'm responsible for Ethernet...

lilalaunestift commented 7 months ago

Hey @kostaond,

I assume that the cause for the 1-2% losses I still observe in the case where the buffer is configured to 204B in length is, that some of our messages transmitted by the SPI master are shorter than 204B. So there are still some occasions where the size of the buffer and the message don't match.

Anyways, thank you very much for your effort so far. I will then wait for some information from the SPI team. Greetings

wanckl commented 7 months ago

IDF SPI team hide in corner and scare a lot :rofl:

wanckl commented 7 months ago

@lilalaunestift I remember that, due to DMA HW architecture, for esp32 rx direction, no matter master or slave, the actually trans length need to be WORD aligned.

That means, you use esp32 as slave and use dma, you need config slave rx buffer address and length WORD aligned, meanwhile, master side should also write actually length align to WORD.

Though it can't explain : If acData[256] and SPI frames transmitted from master are 204B, I observe ping loss

However other chips after esp32 don't have this limitation, (S2, C3 ....), If you have, it should work without issue....

:star:

lilalaunestift commented 7 months ago

Hey @wanckl, yes, the buffers must be word aligned and they all are. After kostaond mentioned the issue he found in the minimal example I checked twice in our actual code base (See code snippet from last week). Also I want to point out again, that everything was working fine for two years with older versions of the IDF where the WORD alignment restriction was the same. The issue started with the update to IDFv5.1.

Changing to another type of the esp32 is not an option since the product is already in the market for two years with the esp32.

wanckl commented 7 months ago

@lilalaunestift yes,

but you mentioned that some of our messages transmitted by the SPI master are shorter than 204B , So I notice master transfer length need also align to 4 byte, otherwise will also lead esp32 slave receive broken package.

By the way, you means even 5.0 is work fine ?

lilalaunestift commented 7 months ago

Ok, sorry. I didn't get that you are talking about the length of the transmitted data. But this is also always Word aligned. This line in the master code ensures that the transmitted length is always a multiple of 4:

oSioSpi.nSndCnt     = (pPacket->Header.cTotLen + 3) & 0xfffc;

IDF5.0 is not tested. The issue occured while updating the code from IDFv4.3 to IDFv5.1 (the latest version at that time).

wanckl commented 7 months ago

@lilalaunestift

So now issue is on slave side that slave can't receive correct data some time right ? slave send direction and master side is OK.
Then, could you know what time the slave transaction broken, and the detail of this broken transaction.

beside, I think IDF SPI team is also going to spring festival, may no update several days,,,

kostaond commented 7 months ago

So now issue is on slave side that slave can't receive correct data some time right ? slave send direction and master side is OK. Then, could you know what time the slave transaction broken, and the detail of this broken transaction.

It's even worse. It seems the slave transactions are OK but they somehow affect transmit side of SPI master (SPI Ethernet DM9051) which is connected to the other SPI interface.

lilalaunestift commented 7 months ago

@wanckl

Details of broken transaction

Regarding the data corruption (see also one of the first comments):

Here is an example of the observed data corruption:

Data in the buffer as passed to the dm9051 driver in (emac_dm9051_transmit()):

28 6b 35 b2 71 f9 a8 03 2a ee c4 67 08 00 45 00
00 54 9e ba 40 00 ff 01 f7 66 c0 a8 b2 1b c0 a8
b2 1a 00 00 00 c7 00 01 2f 5c 59 65 15 65 00 00
00 00 9b 3e 07 00 00 00 00 00 10 11 12 13 14 15
16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25
26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35
36 37

Data observed on the SPI bus via logic analyzer:

28 6B 35 B2 71 F9 A8 03 2A EE C4 67 08 00 45 00 
00 54 9E BA 40 00 FF 01 F7 66 C0 A8 B2 1B C0 A8 
B2 1A 00 00 00 C7 00 01 2F 5C 59 65 15 65 00 00 
00 00 9B 00 00 9B 3E 07 00 00 00 00 00 10 11 12 
13 00 00 00 00 9B 3E 07 00 00 00 00 00 10 11 12 
13 00 00 00 00 9B 3E 07 00 00 00 00 00 10 11 12 
13 00

In this example data should be transmitted to the ethernet controller. Correct data is passed to the driver, but on the SPI bus I can observe that some part of the data suddenly gets repeated.

The same can be observed when data is received from the ethernet controller: Data on the SPI bus looks fine, but data in the receive buffer is corrupted as shown above.

When is the transaction broken

As Kostaond says, the SPI slave communication is fine. It seems that the SPI Slave causes issues on the SPI master. If I deactivate the SPI slave communication, the SPI master works fine and communicates without problems with the ethernet board. If I activate the SPI slave again, I get the described issues on the SPI master.

In the attached pictures here you can see that the issue occurs, when SPI master transmits/receives data and SPI slave receives data at the same time. https://github.com/espressif/esp-idf/issues/12354#issuecomment-1759899550

The first part of the received data is correct. But when the transaction on the SPI slave is finished, the data corruption on the SPI master starts and I can observe the repeating pattern as shown above.

And it seems that the issue on the SPI master only occurs, when the received data on the SPI slave is shorter than the specified receive buffer size.

lilalaunestift commented 6 months ago

Hi again, are there any news on this issue you can share? Greetings

wanckl commented 6 months ago

:cry: A bit busy recently