espressif / esp-hosted

Hosted Solution (Linux/MCU) with ESP32 (Wi-Fi + BT + BLE)
Other
675 stars 158 forks source link

reinserting driver without host-restart results in communication loss #334

Closed Philipansari closed 7 months ago

Philipansari commented 7 months ago

I am using the esp-hosted NG version 1.0.3 (latest commit) on a esp32c3 chip together with an arm-based linux (Openwrt v21, kernel 5.4.154). Connected via SPI-bus.

If the kernel-module is loaded the first time everything works as expected. As soon as the kernel-module gets removed (without any error messages) and reinserted again (using modprobe or insmod) the driver is unable to communicate to the esp32c3.

I already checked the static void __exit esp_exit(void) function and made sure that the hardware gets freed correctly (spi-bus and both gpios).

After observing the data-ready and handshake pins with an oscilloscope and adding several debug prints into the linux-host code i found the section where the problem occurs:

Code: esp_spi.c:221:process_rx_buf() line 230:

        /* Validate received SKB. Check len and offset fields */
    if (offset != sizeof(struct esp_payload_header)) {
        return -EINVAL;
    }

This check fails during the initial communication between linux-host and esp32c3. Both boot up correctly. The only available kernel print after the insert call is:

esp_interface_ng:spi_dev_init: ESP32 peripheral is registered to SPI bus [0],chip select [0], SPI Clock [10]

No error follows.

Is there a way to fix it? I would like to prevent the restart.

Philipansari commented 7 months ago

Update: it seems i have the same problem for the first-generation as well. The first generation provides an error. See following dmesg:

[  341.618736] esp_reset, ESP32: Resetpin of Host is 111
[  341.619001] esp_reset, ESP32: Triggering ESP reset.
[  341.619855] ESP: SPI host config: GPIOs: Handshake[88] DataReady[81]
[  341.620254] ESP host driver claiming SPI bus [0],chip select [0] with init SPI Clock [10]
[  341.637026] esp spi thread created
[  342.325789] offset_rcv[6] != exp[12], drop
mantriyogesh commented 7 months ago

Hello @Philipansari ,

So as I understood, second run (say by manually resetting esp32-C3) is not working but first one is fine.

Are you using long jumper cables to connect? Drive strength of SPI GPIO pins can be increased,

https://github.com/espressif/esp-hosted/blob/master/esp_hosted_fg/esp/esp_driver/network_adapter/main/spi_slave_api.c#L643-L646

What is raw throughout for spi for first run? I think first run also need to be cross checked if spi is correctly working (some transactions working, and some are not?)

mantriyogesh commented 7 months ago

Using latest master for FG case?

Philipansari commented 7 months ago

I found the problem. For my hardware setup i need to change to SPI-mode 1 instead of the default SPI-mode 2. During the first startup my MOSI and SPI-CLK lanes are low, and after i removed the esp-hosted driver it was held high (because of the SPI-Mode 2 usage). This caused a one-clock offset at the first data-transmission after re-installing the driver. I traced the SPI-signals together and saw that the transmitted data was different due to the clock shift:

First tranmission sequence after clean host start:

spi_full_first_start_wide_2

After removing the kernel-module and re-inserting it again:

spi_full_second_start_wide_2

The clock-shift caused the data to be unexpected on the linux side. (for FG and also NG).

An error print which detects if the header byte (first byte in transmission) is one bit shifted left/right indicating that the SPI-mode has to be changed would be helpful. The first-generation at least had one error print that the buffer has an unexpected offset. In the next-generation it was removed.

Here are the answers for your questions (in case anyone else is suffering from this problem and can check whether it suits or not)

Using latest master for FG case?

Yes, i am using the latest master (linux-host code and esp-firmware)

So as I understood, second run (say by manually resetting esp32-C3) is not working but first one is fine.

I just checked if simply resetting the esp32 causes the same issue -> yes it does. Here are the dmesg log (after 30s i resetted the esp32 manually:

[   37.849577] esp_reset, ESP32: Resetpin of Host is 111
[   37.849826] esp_reset, ESP32: Triggering ESP reset.
[   37.850177] ESP: SPI host config: GPIOs: Handshake[88] DataReady[81]
[   37.850575] ESP host driver claiming SPI bus [0],chip select [0] with init SPI Clock [10]
[   37.866963] esp spi thread created
[   38.556193] INIT event rcvd from ESP
[   38.559855] EVENT: 2
[   38.562048] EVENT: 1
[   38.564251] ESP Reconfigure SPI CLK to 30 MHz
[   38.568613] EVENT: 0
[   38.570836] EVENT: 3
[   38.575284] ESP peripheral capabilities: 0xe8
[   68.616036] offset_rcv[6] != exp[12], drop

Are you using long jumper cables to connect?

No, in fact i am using a prototype PCB where all connections are equally long and shorter than ~3cm.

Drive strength of SPI GPIO pins can be increased,

I just checked, but it seems its not fixing the issue.

What is raw throughout for spi for first run?

I didn't made a raw throughput test so far

mantriyogesh commented 7 months ago

Interesting issue. So SPI mode 1 worked across ESP resets?

Philipansari commented 7 months ago

yes, it does :+1: