Open lilalaunestift opened 12 months ago
@lilalaunestift thanks for nice detailed report. Could you please try to lower SPI CLK for DM9051 to 20 MHz?
Edit: I see, it's actually 5 MHz... What is SPI CLK of slave?
Hi, the clock of the slave SPI is running with 4.096MHz.
One more question...
I moved the shown initialization above to the ethernet example and it is still working fine
Do you mean just solely Ethernet or Ethernet and the SPI slave?
I moved only the Ethernet to the dm9051 example.
Hey, I did some more measurements with the logic analyzer. I captured both SPI buses this time. The following two pictures show the two constellations I found, where the SPI Master is transmitting wrong data on the bus. The point in time where the corruption starts is shortly after the SPI Slave is done with receiving (the red and purple markers placed in the capture show where the corruption approximately starts). Again I can observe, that some part of the data is sent repeadetly.
On the other hand, I found transmissions where also both buses are active, but there is no corruption on the SPI master:
Here it seems that the SPI slave is considerate of the SPI master.
Hope this helps somehow. Greetings
@lilalaunestift thank you for the report, we try to reproduce and let you know.
Hey @kostaond, are there any news on this issue so far? Are you able to reproduce this behavior? Greetings
Hi @lilalaunestift, my colleague tries to reproduce it. However, he hasn't be able to reproduce yet...
@lilalaunestift how long are expected transactions at SPI slave? If they are less than or equal to 32B, could you please try to disable DMA at the slave interface?
We haven't been able to reproduce. We've based our setup on SPI Slave example. We have two ESP32's - one as master and one slave with DM9051. Could you please provide more information about your SPI slave code?
@kostaond
If they are less than or equal to 32B, could you please try to disable DMA at the slave interface?
Care to elaborate on that?
@lilalaunestift how long are expected transactions at SPI slave? If they are less than or equal to 32B, could you please try to disable DMA at the slave interface?
The transactions are longer than 32B. Disabling the DMA is unfortunately not an option here.
We haven't been able to reproduce. We've based our setup on SPI Slave example. We have two ESP32's - one as master and one slave with DM9051. Could you please provide more information about your SPI slave code?
Just for clarification: are you also using a setup where the ESP32 is master and slave at the same time (see picture)?
Just for clarification: are you also using a setup where the ESP32 is master and slave at the same time (see picture)?
Yes, we added the DM9051 to the SPI Slave example.
Could you please provide more details about your setup? For example, slave code, what is the traffic (size, period), etc.
Hey, sorry for the delay.
Regarding our slave code: 99.9% of the data has a length of 201B and is transmitted periodically every 4ms from master to slave. Clock frequency of the slave is 4,096MHz. Besides the four SPI lines, there are two additional lines for the communication. Both of them are controlled by the ESP:
Sending and receiving are done sequentially. So while the ESP slave is sending, the master is only receiving and vice versa. Here is the part where the slave code is interacting with the SPI driver.
#include "../inc/Clock.h"
#include "../inc/Crc16.h"
#include "driver/gpio.h"
#include "driver/spi_slave.h"
#include "esp_intr_alloc.h"
static void Msp_postSetupCb(spi_slave_transaction_t* pTrans);
static void Msp_postTransCb(spi_slave_transaction_t* pTrans);
uint8_t acSnd[268];
typedef struct SMsp
{
spi_slave_transaction_t t0;
uint32_t nCpuTick;
}
Msp_t;
Msp_t oMsp;
esp_err_t Msp_init(uint32_t nMaxLen)
{
esp_err_t esp_err;
memset(&acSnd, 0xff, sizeof(acSnd));
memset(&oMsp, 0x00, sizeof(Msp_t));
// Configuration for the SPI bus
spi_bus_config_t buscfg =
{
.mosi_io_num = SPI_MSP_MOSI_PIN,
.miso_io_num = SPI_MSP_MISO_PIN,
.sclk_io_num = SPI_MSP_CLK_PIN,
.quadwp_io_num = -1,
.quadhd_io_num = -1,
.max_transfer_sz = nMaxLen,
.flags = SPICOMMON_BUSFLAG_SLAVE,
.intr_flags = ESP_INTR_FLAG_LOWMED
};
// Configuration for the SPI slave interface
spi_slave_interface_config_t slvcfg =
{
.spics_io_num = SPI_MSP_nCS_PIN,
.flags = 0,
.queue_size = 2, // at least 2
.mode = 1,
.post_setup_cb = &Msp_postSetupCb,
.post_trans_cb = &Msp_postTransCb
};
//Initialize SPI slave interface
esp_err = spi_slave_initialize(MSP_HOST, &buscfg, &slvcfg, MSP_DMA_CHAN);
assert(esp_err == ESP_OK);
return esp_err;
}
void Msp_transReady(void)
{
// ready to transmit
gpio_set_level(OUT_MSP_nINT_PIN, 0);
}
void Msp_writeBlock(uint8_t* acSndData, uint8_t* acRcvData, uint32_t nMaxLen)
{
esp_err_t esp_err;
oMsp.t0.length = nMaxLen << 3;
oMsp.t0.rx_buffer = acRcvData;
oMsp.t0.trans_len = 0;
oMsp.t0.tx_buffer = acSndData;
oMsp.t0.user = (void*)1;
esp_err = spi_slave_queue_trans(MSP_HOST, &oMsp.t0, 0);
assert(esp_err == ESP_OK);
}
void Msp_readBlock(uint8_t* acRcvData, uint32_t nMaxLen)
{
esp_err_t esp_err;
oMsp.t0.length = nMaxLen << 3;
oMsp.t0.rx_buffer = acRcvData;
oMsp.t0.trans_len = 0;
oMsp.t0.tx_buffer = acSnd;
oMsp.t0.user = (void*)0;
esp_err = spi_slave_queue_trans(MSP_HOST, &oMsp.t0, 0);
assert(esp_err == ESP_OK);
}
void Msp_getTransResult(void)
{
esp_err_t esp_err;
spi_slave_transaction_t * pTrans = NULL;
esp_err = spi_slave_get_trans_result(MSP_HOST, &pTrans, 0);
esp_err = ESP_OK;
if (esp_err != ESP_OK)
return;
if ((uint32_t)pTrans->user != 0)
{
// not ready to transmit
gpio_set_level(OUT_MSP_nINT_PIN, 1);
}
if (pTrans->tx_buffer == acSnd)
pTrans->tx_buffer = NULL;
SioMsp_onEvTransComplete( pTrans->tx_buffer,
pTrans->rx_buffer,
pTrans->trans_len );
}
// called after a transaction is queued and ready for pickup by master.
static void IRAM_ATTR Msp_postSetupCb(spi_slave_transaction_t* pTrans)
{
// wait 1�s if necessary! MSP must see this negative edge!
while (Clock_getCpuTicks() - oMsp.nCpuTick < eTick_tm1us);
gpio_set_level(OUT_MSP_nRDY_PIN, 0);
}
// called after transaction is sent/received.
static void IRAM_ATTR Msp_postTransCb(spi_slave_transaction_t* pTrans)
{
BaseType_t xHigherPriorityTaskWoken;
// not ready to receive or transmit
gpio_set_level(OUT_MSP_nRDY_PIN, 1);
oMsp.nCpuTick = Clock_getCpuTicks();
// call spi_slave_get_trans_result ...
xHigherPriorityTaskWoken = pdFALSE;
SioMsp_onPostTransFromISR(&xHigherPriorityTaskWoken);
if (xHigherPriorityTaskWoken == pdTRUE)
portYIELD_FROM_ISR();
}
Thanks and Greetings
Hey @kostaond , did you find any useful information in the shared code that could help?
Assuming the problem is related to the DMA, is there anything I can do track the issue down or provide additional debug information? The documentation does not give much information about the topic. So I don't really know where to start. Greetings
@lilalaunestift sorry for not replying, I was busy with other tasks. However, provided code still has room for uncertainty. We invested quite some time with the previous attempt using modified SPI Slave example. Therefore I would much appreciate, if you could provide fully functioning minimum project under which you are able to demonstrate the issue. We need to reproduce it at our side to move forward. I tried to discuss with team responsible for SPI and they indicated that the issue could be at HW design side (PCB)...
Ok, I will try to create a minimal project. I guess this will take some days till I find the time. I will let you know. Regarding the HW design: The reason why we so far have not investigated an issue on the pcb side is, that everything was working fine with idf4.3 and earlier versions. Is there any specific assumption what could cause the issue on the pcb? I could ask our HW guy to have a look at it then. Thanks and greetings
Hey, it took some time but I managed to created a minimal example for the esp32 with which I am able to reproduce the issue. SPI_Issue_min_example.zip Some explanations:
The mentioned additional pins for the SPI bus are not used in this minimum project (they are set to a fixed state and do not participate in the communication).
If I now use 'ping' to send ICMP packages to the esp32, roughly 7-10% of the messages are lost or damaged. When deactivating the Slave_task, 100% are received.
I'm still using the same setup regarding IDF and HW as mentioned in the beginning.
Greetings.
Hey, have you already found the time to take a look into the example? Greetings
Hi @lilalaunestift, yes, we've give it a try but we have some troubles. I'll get back to you once there is something to share. Please be patient.
Ok, great. Thank you very much for the update.
Hi @kostaond, may i ask for a small update on this topic? Is it possible to reproduce the issue with the provided example? Greetings
Hi @lilalaunestift, we had issues with SPI master... At the end, I needed to implement it on bear-metal SAM3S MCU to achieve 4 ms period. Therefore it took a time to find appropriate hardware, prepare all the infrastructure and the test setup. Anyway, I was able to reproduce the issue with minimum code example you provided.
The good thing is that I probably found the root cause of the issue. Your Rx buffer is not 32-bit aligned:
typedef struct SData
{
uint8_t acData[258]; // !!!
}
Data_t;
The memory alignment is required by DMA engine otherwise the DMA may write incorrectly or not in a boundary aligned manner.. When I changed the Rx buffer size to 256B and transmitted the SPI message with the same size, there were no lost ping packets (I tried with ping 10.10.10.104 -i 0.5
).
The problem is the driver didn't report error as it should have when incorrect aliment was used. I've already reported this issue to SPI colleagues.
@kostaond Do the restrictions described on the linked page also apply to spi_master?
@kostaond Do the restrictions described on the linked page also apply to spi_master?
Very good question, they apply. I'm not sure if check is correctly implemented in code though. I asked SPI team to double check.
Hi @kostaond, thats good news! Thank you very much the effort! I will look into this topic tomorrow and then provide some feedback.
Hi @kostaond, I did some tests with your suggested change but it seems that the issue still persists.
The not word aligned buffer you mentioned is something I introduced while creating the minimal example. Sorry for that. In our actual code the Data struct is only part of the bigger struct _Framet which acts as receive buffer. But for simplification I removed the other part and only _Datat was left. Actually there are asserts that make sure the buffer is word aligned and has the correct length:
#pragma pack(push, 1)
typedef struct SFrame
{
union
{
Data_t Data;
Packet_t Packet;
};
uint16_t nLen;
struct
{
ESioAddr_t eSioDstPortAddr;
ESioAddr_t eSioSrcPortAddr;
};
}
Frame_t;
#pragma pack(pop)
// make sure that some properties hold:
_Static_assert(sizeof(Frame_t) == 268, "wrong Frame_t Size");
_Static_assert(sizeof(Frame_t) % 4 == 0, "Frame_t Array must be word aligned");
_Static_assert(sizeof(Data_t) == 258, "wrong Data_t Size");
_Static_assert(sizeof(Packet_t) < 258, "wrong Packet_t Size");
_Static_assert(OFFSET(Frame_t, Data) % 4 == 0, "wrong Data Offset");
_Static_assert(OFFSET(Frame_t, Packet) % 4 == 0, "wrong Packet Offset");
The actual call to MSP_readblock looks like this:
...
static void SioMsp_rcvBuffer(Frame_t* pRcvFrame)
{
Msp_readBlock(&pRcvFrame->Data.acData[0], sizeof(Frame_t));
}
...
where sizeof(Frame_t) is applied as length to the _spi_slave_transactiont struct.
Anyways, I tested your suggested changes with the provided minimal example and I got the following results: | receive buffer size | lost ping packets |
---|---|---|
258B | 7-10% | |
256B | 7-10% | |
260B | 7-10% | |
204B | 1-2% | |
208B | 7-10% |
The data transmitted by our SPI master is (in 98% of the cases) 204B in length. If I make the receive buffer fit this length, I get a much better result (but still too many packets are lost). If I just increase the buffer by 4B to 208B, I'm back to the huge packet loss of more than 7%...
Can you confirm this behavior with your setup?
By the way, I did the tests with:
ping <ip> -c 100 -i 0.5
Greetings
@lilalaunestift if I set acData
buffer to size greater than actual transmitted data from master, I am able to reproduce the issue. In other words:
acData[256]
and SPI frames transmitted from master are 204B
, I observe ping loss.acData[204]
and SPI frames transmitted from master are 204B
, I do NOT observe ping loss.acData[256]
and SPI frames transmitted from master are 256B
, I do NOT observe ping loss.This could be your workaround. However, something is probably wrong somewhere. I'll pass it to SPI team. My work is done here since it is beyond my specialization... I'm responsible for Ethernet...
Hey @kostaond,
I assume that the cause for the 1-2% losses I still observe in the case where the buffer is configured to 204B in length is, that some of our messages transmitted by the SPI master are shorter than 204B. So there are still some occasions where the size of the buffer and the message don't match.
Anyways, thank you very much for your effort so far. I will then wait for some information from the SPI team. Greetings
IDF SPI team hide in corner and scare a lot :rofl:
@lilalaunestift I remember that, due to DMA HW architecture, for esp32
rx direction, no matter master or slave, the actually trans length need to be WORD aligned.
That means, you use esp32 as slave and use dma, you need config slave rx buffer address and length WORD aligned, meanwhile, master side should also write actually length align to WORD.
Though it can't explain :
If acData[256] and SPI frames transmitted from master are 204B, I observe ping loss
However other chips after esp32
don't have this limitation, (S2, C3 ....), If you have, it should work without issue....
:star:
Hey @wanckl, yes, the buffers must be word aligned and they all are. After kostaond mentioned the issue he found in the minimal example I checked twice in our actual code base (See code snippet from last week). Also I want to point out again, that everything was working fine for two years with older versions of the IDF where the WORD alignment restriction was the same. The issue started with the update to IDFv5.1.
Changing to another type of the esp32 is not an option since the product is already in the market for two years with the esp32.
@lilalaunestift yes,
but you mentioned that some of our messages transmitted by the SPI master are shorter than 204B
, So I notice master transfer length need also align to 4 byte, otherwise will also lead esp32 slave receive broken package.
By the way, you means even 5.0 is work fine ?
Ok, sorry. I didn't get that you are talking about the length of the transmitted data. But this is also always Word aligned. This line in the master code ensures that the transmitted length is always a multiple of 4:
oSioSpi.nSndCnt = (pPacket->Header.cTotLen + 3) & 0xfffc;
IDF5.0 is not tested. The issue occured while updating the code from IDFv4.3 to IDFv5.1 (the latest version at that time).
@lilalaunestift
So now issue is on slave side that slave can't receive correct data some time right ? slave send direction and master side is OK.
Then, could you know what time the slave transaction broken, and the detail of this broken transaction.
beside, I think IDF SPI team is also going to spring festival, may no update several days,,,
So now issue is on slave side that slave can't receive correct data some time right ? slave send direction and master side is OK. Then, could you know what time the slave transaction broken, and the detail of this broken transaction.
It's even worse. It seems the slave transactions are OK but they somehow affect transmit side of SPI master (SPI Ethernet DM9051) which is connected to the other SPI interface.
@wanckl
Regarding the data corruption (see also one of the first comments):
Here is an example of the observed data corruption:
Data in the buffer as passed to the dm9051 driver in (emac_dm9051_transmit()):
28 6b 35 b2 71 f9 a8 03 2a ee c4 67 08 00 45 00
00 54 9e ba 40 00 ff 01 f7 66 c0 a8 b2 1b c0 a8
b2 1a 00 00 00 c7 00 01 2f 5c 59 65 15 65 00 00
00 00 9b 3e 07 00 00 00 00 00 10 11 12 13 14 15
16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25
26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35
36 37
Data observed on the SPI bus via logic analyzer:
28 6B 35 B2 71 F9 A8 03 2A EE C4 67 08 00 45 00
00 54 9E BA 40 00 FF 01 F7 66 C0 A8 B2 1B C0 A8
B2 1A 00 00 00 C7 00 01 2F 5C 59 65 15 65 00 00
00 00 9B 00 00 9B 3E 07 00 00 00 00 00 10 11 12
13 00 00 00 00 9B 3E 07 00 00 00 00 00 10 11 12
13 00 00 00 00 9B 3E 07 00 00 00 00 00 10 11 12
13 00
In this example data should be transmitted to the ethernet controller. Correct data is passed to the driver, but on the SPI bus I can observe that some part of the data suddenly gets repeated.
The same can be observed when data is received from the ethernet controller: Data on the SPI bus looks fine, but data in the receive buffer is corrupted as shown above.
As Kostaond says, the SPI slave communication is fine. It seems that the SPI Slave causes issues on the SPI master. If I deactivate the SPI slave communication, the SPI master works fine and communicates without problems with the ethernet board. If I activate the SPI slave again, I get the described issues on the SPI master.
In the attached pictures here you can see that the issue occurs, when SPI master transmits/receives data and SPI slave receives data at the same time. https://github.com/espressif/esp-idf/issues/12354#issuecomment-1759899550
The first part of the received data is correct. But when the transaction on the SPI slave is finished, the data corruption on the SPI master starts and I can observe the repeating pattern as shown above.
And it seems that the issue on the SPI master only occurs, when the received data on the SPI slave is shorter than the specified receive buffer size.
Hi again, are there any news on this issue you can share? Greetings
:cry: A bit busy recently
Answers checklist.
IDF version.
v5.1.1-1-gd3c99ed3b8
Espressif SoC revision.
ESP32-D0WD-V3 (revision v3.0)
Operating System used.
Linux
How did you build your project?
VS Code IDE
If you are using Windows, please specify command line type.
None
Development Kit.
Custom Board
Power Supply used.
External 3.3V
What is the expected behavior?
Two SPI buses are used:
What is the actual behavior?
The communication of HSPI master is unstable. Approx. 10% of the messages are corrupted somehow. This can be observed for both incoming and outgoing data: For incoming data over the MISO line, it can be observed that data on the SPI bus sent by dm9051 is correct (via Logic Analyzer), but partly faulty data can be found in the receive buffer. For outgoing data (MOSI), there is correct data in the send buffer, but partly faulty data can be observed on the SPI bus.
Steps to reproduce.
define MSP_HOST VSPI_HOST
define MSP_DMA_CHAN 2
esp_err_t Slave_init(uint32_t nMaxLen) { esp_err_t esp_err;
}
More Information.
Is it possible that there is an issue on balancing the DMA usage? It seems that somehow the data is corrupted between the SPI bus and dm9051 send/receive buffer. At the same time I would not suspect a SPI issue, since the data is correct in most parts and the faulty parts are not arbitrary data (see log above).