Closed Evlers closed 9 months ago
There is very clearly something wrong at the SPI timing for host and slave spi transactions. Unless this is corrected, you will fall into packet loss issues.
Some checks:
Typically, we suggest to get the SPI timings correct in between your host and slave. Also these issues generally can be clearly understood using the logic analyzer.
Also if you are using something like STM32 as host, you might want to double-check NSS is correct. We last time (some time ago) checked and found NSS is not working very correctly.
So instead of using NSS, we relied on the normal GPIO as CS. While doing so, we removed the NSS functionality of SPI CS for same GPIO pin number and used it as CS as manual GPIO.
Check SPI version 2 IOC: GPIO is normal GPIO and not NSS.
GPIO is used as CS: https://github.com/espressif/esp-hosted/blob/504210623ec632efae1d39ad2636e235485ab3d4/esp_hosted_fg/host/stm32/driver/transport/spi/spi_drv.c#L553-L557
Thanks for your guidance!
On the host side, I used manual GPIO instead of SPI NSS. And I also specially checked the ESP32 side code, which is consistent with the SPI mode used by the host, which is SPI_MODE_2. I also tried three other modes:
In addition, I used the default GPIO of the master branch for ESP32c3 and did not change to other ports, there should be no matrix problem. And our PCB line is within 3cm, although there is no isometric treatment, but the length is not far apart.
Our company logic analyzer is broken, borrow one tomorrow to analyze it again!
Slave side mode 0 is anyway should not be used as pointed in Porting guide.
Although next link is for esp32 (not C6), but the concepts for timing are similar. https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/peripherals/spi_slave.html#restrictions-and-known-issues
The logic analyser will be helpful to understand which transitions are not falling correctly. In second link above there is explanation of timing as well.
Okay, I'll take a look at the link you sent me. Thank you very much!
while( hspi1.State == HAL_SPI_STATE_BUSY );
in host?
You can further use DMA for this. but anyway you need to make sure that the SPI is only initiated once prior transaction is complete.
This will also avoid rare condition that explained in : https://github.com/espressif/esp-hosted/issues/308#issuecomment-1873944557 and https://github.com/espressif/esp-hosted/issues/308#issuecomment-1873957008
Please note that the spi_bus_lock
mutex and spi_trans_ready_sem semapore
for synchronisation.
I remember there was OOOSeq ( Out of order sequence packets) problem in master code, as Tx / SPI_transaction was allowed done through multiple threads.
In the new branch we made sure that the transaction can only be trigger from one place, any place the transaction is to be triggered, was simply giving the semaphore.
We also allowed some extra transactions as dummy in end from either side, just to make sure that all valid buffers either side are drained completely.
I think code quality of https://github.com/espressif/esp-hosted/blob/feature/esp_as_mcu_host/host/drivers/transport/spi/spi_drv.c is better as it is evolved from issues and race conditions. Better to check logical diff for slave and host spi files from master to feature/esp_as_mcu_host to understand the changes.
From experience, I think the issue you face is definitely because of the timings mismatch in between host and slave. But at the same time, I am trying to also point some code changes and fixes we had added in recent branch, if it may probably solve issues.
First of all, as can be seen from the spi_drv.c file, the transmission of spi is only carried out in one place, and is triggered by semaphore, there is no mutual exclusion problem.
The code I transplanted is in rt-thread_esp-hosted, could you please help me check if there is any problem with the logic
I am using GD32F42, and SPI transmission is waiting for completion:
/* Wait for transmission to complete */
while(!dma_flag_get(spi_device->dma.rx.periph, spi_device->dma.rx.channel, DMA_FLAG_FTF));
while(!dma_flag_get(spi_device->dma.tx.periph, spi_device->dma.tx.channel, DMA_FLAG_FTF));
External interrupts are also timely:
static void gpio_interrupt(void *args)
{
/* Post semaphore to notify SPI slave is ready for next transaction */
if (osSemaphore != NULL) {
rt_sem_release(osSemaphore);
}
}
I am now trying to use rt-thread_esp-hosted on the ART-PI(STM32H750) development board.
Let's eliminate the GD32 driver issue!
@Evlers Definitely we can have look, but overall, to study in detail, will take some time,
I think, to less complicate and have only transport in picture for testing, We can focus on 'Raw throughput Tx and Rx test' , where dummy buffers are just passed to find out transport link throughout.
This way better to avoid networking and first concentrating base transport correctness.
The master doc is : https://github.com/espressif/esp-hosted/blob/master/esp_hosted_fg/docs/Linux_based_host/Raw_TP_Testing.md
For feature/esp_as_mcu_host, slave: https://github.com/espressif/esp-hosted/blob/feature/esp_as_mcu_host/slave/main/Kconfig.projbuild#L327-L350
Code:
slave: https://github.com/espressif/esp-hosted/blob/feature/esp_as_mcu_host/slave/main/stats.c
host: https://github.com/espressif/esp-hosted/blob/feature/esp_as_mcu_host/host/utils/stats.c
check for symbol TEST_RAW_TP
I honestly tried not to complicate and was not yet giving you the intermediate branch patch, as it would then introduce three branches, 1. master, 2. intermediate branch, 3. feature/esp_as_mcu_host branch and hell lot of confusion with all these.
intermediate branch I will provide as commit. Please note following things:
Base Hosted master should point to: 4840528810457f393e0e65fe2bb1442dcb6dbc10
On top, apply git patches, using git am ./patches/*.patch
patches_over_4840528810457f393e0e65fe2bb1442dcb6dbc10.tgz
This branch is however tested on stm32 and can import fixes from feature/esp_as_mcu_host
easily (if need be)
That is why shared this.
However, I still think you should get logic analyzer reading first, to find out anything obvious that you can fix. I still not want you to get confused over three branches confusion.
Yeah, too many branch let me very confusing!
The master branch has been ported so we can improve it based on this. Of course, I will refer to the patch you sent on the way to repair, thank you!!
next, then I will try other branches to test.
In addition, I don't think I need to delete my lwip code, because the code of the network layer has been tested on Infineon's WiFi
This error has been very clear is the problem of SPI transmission, I try to solve it first, wait for my good news!
two places of check_and_execute_spi_transaction()
is not good ( your branch, similar to master)
better to refer feature/esp_as_mcu_host -> spi_drv.c how it was removed.
Found the bug!
The mutex needs to be adjusted as shown in the figure below:
I'm guessing because handshake pin was read ahead of time. ESP32 can still transmit data while reading the pin (there is an empty place in the queue), the pin state is low level (ready). The mutual exclusion wait is entered after reading the pin status.
But at this point, the task that gets the mutex is sending data!! After data is sent, the EPS32 queue cache is full and the handshake pin is high. At this point, the task that has completed the data transfer releases the mutex lock The pin state is read early due to the task waiting for the mutex (ready state), leading to the misconception that the ESP32 was in ready state, and initiated the data transfer!!!
May be..
feature/esp_as_host_mcu has https://github.com/espressif/esp-hosted/blob/835a6670fdab619eb31ad44e1e29f8893e041437/host/drivers/transport/spi/spi_drv.c#L300-L380
Yes, it looks like master branch is way behind the feature/esp_as_host_mcu branch
Now I have tested the rate through iperf, it is only about 7.5Mbps, is this the only throughput of the master branch? My host spi works at 25MHz.
The data rate depends on
I actually measured the clock signal with the oscilloscope:
Sometimes even longer, the interval time is 655us, and no data ready pin has a falling edge that does not trigger the host to transmit data
Anyway, now the test has been stable, thank you very much for your help! I will continue to test stability and try to port new branches! Thank you again for your patient response!!!
@Evlers , does your stm ported solution still exist? Also can I refer your solution to him/her ? They seem to be using STM32 as well.
That solution uses the master branch, so there are still stability issues. I'm waiting for your new branch to perfect the portability before we do it again porting. For now, we have chosen Infineon's solution. The code is related to the company's project, so it is not open source. Sorry!
About CS pin control in SPI mode
I am using the master branch as the driver and have finished porting the SPI interface. However, when the device executes iperf -c(that is, TCP TX), the ESP chip will display the following error log:
I think there's something wrong with the SPI transmission!
But before the SPI transmission, I first pull the CS pin low, and then delay 1ms to perform the spi transmission is much better. At the end of the SPI transmission, another 1ms delay to pull the CS pin high is completely fine!
Obviously, this is not feasible, and the network speed will slow down a lot.
Does this CS pin need to wake up wait time? But why can't the CS pin be released directly after the data transfer is complete!