espressif / esp-hosted

Hosted Solution (Linux/MCU) with ESP32 (Wi-Fi + BT + BLE)
Other
675 stars 158 forks source link

esp_hosted_ng: Unable to parse bootup event message from ESP32 #340

Closed sergey-suloev closed 7 months ago

sergey-suloev commented 7 months ago

Hello, I have a problem while host parsing bootup event from ESP32 after applying the change https://github.com/espressif/esp-hosted/issues/333#issuecomment-1950268954

Firmware: ESP32, SPI mode 2, controller 1

Here is the host log:

[ 13.358189] esp32_spi:esp_init: enter [ 13.362119] esp32_spi spi0.0: Probing ESP32 SPI-driver... [ 13.367615] esp32_spi spi0.0: Issue hardware reset... [ 13.596841] esp32_spi spi0.0: Done. [ 13.600723] esp32_spi spi0.0: ESP32 peripheral is registered to SPI bus [0], chip select [0], SPI Clock [1], SPI mode [0x02] [ 13.831711] esp32_spi spi0.0: Probe success. [ 13.836291] esp32_spi:esp_init: ret=0 [ 15.601523] esp32_spi spi0.0: data ready interrupt: irq=47 [ 15.607133] esp32_spi spi0.0: handshake interrupt: irq=46 [ 15.612656] esp32_spi spi0.0: work handler: trans_ready=1, rx_pending=1 [ 15.632327] esp32_spi spi0.0: SPI transfer done. [ 15.637003] esp32_spi spi0.0: handshake interrupt: irq=46 [ 15.642507] esp32_spi spi0.0: process_rx_buf [ 15.646794] esp32_spi spi0.0: process_rx_buf: header->if_type(6) >= ESP_MAX_IF(5) [ 15.659575] esp32_spi spi0.0: work handler: trans_ready=1, rx_pending=0

header->if_type == 6??

sergey-suloev commented 7 months ago

Here is the bootup event data parsed by logic analyzer

0x03·0x00·0x00·0x00·0x1A·0x00·0x0C·0x00·0x66·0x01·0x00·0x00·0x01·0x00·0x13·0x00·0x12·0x00·0x00·0x00·0x03·0x01·0x00·0x02·0x01·0x0A·0x00·0x03·0xF0·0x02·0x0E·0x01·0x00·0x03·0x01·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00·0x00

The first byte is 3, meaning if_type == ESP_INTERNAL_IF.

It seems like the host receives data shifted by 1 bit, i.e. 3 << 1 = 6

sergey-suloev commented 7 months ago

Screenshot from 2024-02-19 20-42-03

sergey-suloev commented 7 months ago

Screenshot from 2024-02-19 21-18-46

sergey-suloev commented 7 months ago

I tried to switch SPI mode to 1. Here is a new host log.

[ 14.125993] esp32_spi:esp_init: enter [ 14.129860] esp32_spi spi0.0: Probing ESP32 SPI-driver... [ 14.135399] esp32_spi spi0.0: Issue hardware reset... [ 14.371414] esp32_spi spi0.0: Done. [ 14.375282] esp32_spi spi0.0: ESP32 peripheral is registered to SPI bus [0], chip select [0], SPI Clock [1], SPI mode [0x01] [ 14.611412] esp32_spi spi0.0: Probe success. [ 14.615994] esp32_spi:esp_init: ret=0 [ 16.383237] esp32_spi spi0.0: data ready interrupt: irq=47 [ 16.388797] esp32_spi spi0.0: handshake interrupt: irq=46 [ 16.394341] esp32_spi spi0.0: work handler: trans_ready=1, rx_pending=1 [ 16.414271] esp32_spi spi0.0: SPI transfer done. [ 16.414339] esp32_spi spi0.0: handshake interrupt: irq=46 [ 16.424481] esp32_spi spi0.0: process_rx_buf [ 16.428774] esp32_spi spi0.0: process_rx_buf: new packet received [ 16.434997] esp32_spi spi0.0: read packet: len=38 [ 16.439751] esp32_spi:process_esp_bootup_event: Received ESP bootup event [ 16.446642] esp32_spi:process_event_esp_bootup: Bootup Event tag: 3 [ 16.452951] esp32_spi:esp_validate_chipset: Chipset=ESP32 ID=00 detected over SPI [ 16.460446] esp32_spi:process_event_esp_bootup: Bootup Event tag: 2 [ 16.466746] esp32_spi:process_event_esp_bootup: Bootup Event tag: 0 [ 16.473045] esp32_spi:process_event_esp_bootup: Bootup Event tag: 1 [ 16.479319] esp32_spi:process_fw_data: ESP chipset's last reset cause: [ 16.485867] esp32_spi:print_reset_reason: POWERON_RESET [ 16.491100] esp32_spi:check_esp_version: ESP Firmware version: 1.0.3 [ 16.497816] esp32_spi:esp_reg_notifier: Driver init is ongoing [ 16.504098] esp32_spi spi0.0: work handler: trans_ready=1, rx_pending=0 [ 16.551439] esp32_spi spi0.0: write packet: len=20 [ 16.556430] esp32_spi spi0.0: work handler: trans_ready=1, rx_pending=0 [ 16.576359] esp32_spi spi0.0: SPI transfer done. [ 16.581037] esp32_spi spi0.0: handshake interrupt: irq=46 [ 16.586517] esp32_spi spi0.0: data ready interrupt: irq=47 [ 16.592195] esp32_spi spi0.0: process_rx_buf [ 16.596521] esp32_spi spi0.0: process_rx_buf: header->if_type(15) >= ESP_MAX_IF(5) [ 16.596570] esp32_spi spi0.0: work handler: trans_ready=1, rx_pending=1 [ 16.624132] esp32_spi spi0.0: handshake interrupt: irq=46 [ 16.632373] esp32_spi spi0.0: SPI transfer done. [ 16.637037] esp32_spi spi0.0: handshake interrupt: irq=46 [ 16.642564] esp32_spi spi0.0: process_rx_buf [ 16.646850] esp32_spi spi0.0: process_rx_buf: header->if_type(15) >= ESP_MAX_IF(5) [ 16.654646] esp32_spi spi0.0: work handler: trans_ready=1, rx_pending=0 [ 21.521364] esp32_spi:wait_and_decode_cmd_resp: Command[1] timed out [ 21.527772] esp32_spi:cmd_init_interface: wait_and_decode_cmd_resp(priv, cmd_node) failure, ret: -22 [ 21.536977] esp32_spi:esp_add_card: esp_add_network_ifaces(adapter) failure, ret: -1 [ 21.544768] esp32_spi:process_event_esp_bootup: network iterface init failed

mantriyogesh commented 7 months ago

Los at both sides please?

mantriyogesh commented 7 months ago

If not mistaken, you are using ESP32.

Have you enabled the code to enter here:

https://github.com/sergey-suloev/esp-hosted/blob/e51d5a0e41878c26a51c01b91dc73734d952c1b6/esp_hosted_ng/host/spi/esp_spi.c#L364

sergey-suloev commented 7 months ago

@mantriyogesh thanks for reply. it is enabled currently. I have "adjust spi clock" disabled but this one is enabled.

sergey-suloev commented 7 months ago

The "CS change" code does not affect the behavior too much. It seems to be a little more stable with "CS change" disabled

mantriyogesh commented 7 months ago

The "CS change" code does not affect the behavior too much. It seems to be a little more stable with "CS change" disabled

with SPI mode 2, cs_change enabled with ESP32? ESP32 has some different SPI timing expectations than all other ESP chipsets.

mantriyogesh commented 7 months ago

also have you tested this wil latest master of NG? There are some fixes gone in recently.

sergey-suloev commented 7 months ago

also have you tested this wil latest master of NG? There are some fixes gone in recently.

I am now testing spi mode 1. Do you mean I should switch to mode 2 in order to use cs change ?

sergey-suloev commented 7 months ago

also have you tested this wil latest master of NG? There are some fixes gone in recently.

Yes. I see some changes . I need to merge some of them manually . Let know if it helps

sergey-suloev commented 7 months ago

ok, I experimentally found that it works more or less stable at 1MHz , spi mode 1, "CS change" disabled. If freq is increased to >= 2MHz then timeouts happen again

mantriyogesh commented 7 months ago

https://github.com/espressif/esp-hosted/issues/340#issuecomment-1953872665

Is with latest master?

sergey-suloev commented 7 months ago

#340 (comment)

Is with latest master?

yes, after merge. I can only see more logs now, nothing has changed in behavior

sergey-suloev commented 7 months ago

[ 14.996719] esp32_spi: esp_init: esp_init: enter [ 15.001544] esp32_spi spi0.0: Probing ESP32 SPI-driver... [ 15.007121] esp32_spi spi0.0: Hardware reset [ 15.233638] esp32_spi spi0.0: ESP32 peripheral is registered to SPI bus [0], chip select [0], SPI Clock [1], SPI mode [0x01] [ 15.463412] esp32_spi spi0.0: Probe success. [ 15.467978] esp32_spi: esp_init: esp_init: ret=0 [ 16.803977] dwmac-sun8i 1c30000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx [ 17.261633] esp32_spi: process_esp_bootup_event: Received ESP bootup event [ 17.268764] esp32_spi: process_event_esp_bootup: Bootup Event tag: 3 [ 17.275189] esp32_spi: esp_validate_chipset: Chipset=ESP32 ID=00 detected over SPI [ 17.282791] esp32_spi: process_event_esp_bootup: Bootup Event tag: 2 [ 17.289192] esp32_spi: process_event_esp_bootup: Bootup Event tag: 0 [ 17.295588] esp32_spi: process_event_esp_bootup: Bootup Event tag: 1 [ 17.301956] esp32_spi: process_fw_data: ESP chipset's last reset cause: [ 17.308599] esp32_spi: print_reset_reason: POWERON_RESET [ 17.313963] esp32_spi: check_esp_version: ESP Firmware version: 1.0.3 [ 17.320907] esp32_spi: esp_reg_notifier: Driver init is ongoing [ 17.327204] esp32_spi: esp_cfg80211_add_iface: Updated priv[0] to c25dd580 [ 17.413414] esp32_spi: esp_cmd_work: Processing Command [0x1] [ 17.419280] tx: 330be8a7: 00 00 01 00 08 00 0c 00 16 00 00 00 01 00 00 00 ................ [ 17.427708] tx: 6e19f4df: 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 17.482935] esp32_spi: wait_and_decode_cmd_resp: Resp for command [0x1] [ 17.513381] esp32_spi: esp_cmd_work: Processing Command [0x3] [ 17.519204] tx: 59981a6b: 00 00 01 00 08 00 0c 00 18 00 00 00 03 00 00 00 ................ [ 17.527607] tx: a01cd7be: 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ................ [ 17.573387] esp32_spi: esp_cmd_work: Busy in another cmd execution [ 17.579688] esp32_spi: wait_and_decode_cmd_resp: Resp for command [0x3] [ 17.587012] esp32_spi: esp_cfg80211_get_tx_power: [ 17.643479] esp32_spi: esp_cmd_work: No more command in queue. [ 17.659894] esp32_spi: esp_cfg80211_get_tx_power: [ 17.803446] esp32_spi: init_bt: ESP Bluetooth init [ 17.813883] esp32_spi: print_capabilities: Capabilities: 0xf8. Features supported are: [ 17.816181] bt_tx: af474656: 03 0c 00 ... [ 17.821853] esp32_spi: print_capabilities: WLAN on SPI [ 17.821862] esp32_spi: print_capabilities: BT/BLE [ 17.821867] esp32_spi: print_capabilities: - HCI over SPI [ 17.821872] esp32_spi: print_capabilities: - BT/BLE dual mode [ 17.852024] tx: 8950ab82: 02 00 00 00 03 00 0d 00 22 00 00 00 01 03 0c 00 ........"....... [ 17.860480] tx: 654afe3f: 64 65 73 6b 74 6f 70 2e 44 42 75 73 2e 50 72 6f desktop.DBus.Pro [ 17.911740] bt_tx: ab57e24f: 03 10 00 ... [ 17.919383] tx: 6cefbea7: 02 00 00 00 03 00 0d 00 26 00 00 00 01 03 10 00 ........&....... [ 17.927810] tx: 077cade9: 44 42 75 73 2e 50 72 6f 70 65 72 74 69 65 73 00 DBus.Properties. [ 17.977473] bt_tx: af474656: 01 10 00 ... [ 17.985096] tx: 8950ab82: 02 00 00 00 03 00 0d 00 24 00 00 00 01 01 10 00 ........$....... [ 17.993537] tx: 654afe3f: 64 65 73 6b 74 6f 70 2e 44 42 75 73 2e 50 72 6f desktop.DBus.Pro [ 18.044045] bt_tx: ab57e24f: 09 10 00 ... [ 18.051668] tx: 6cefbea7: 02 00 00 00 03 00 0d 00 2c 00 00 00 01 09 10 00 ........,....... [ 18.060139] tx: 077cade9: 44 42 75 73 2e 50 72 6f 70 65 72 74 69 65 73 00 DBus.Properties. [ 18.109470] bt_tx: af474656: 02 10 00 ... [ 18.117091] tx: 8950ab82: 02 00 00 00 03 00 0d 00 25 00 00 00 01 02 10 00 ........%....... [ 18.125503] tx: 654afe3f: 64 65 73 6b 74 6f 70 2e 44 42 75 73 2e 50 72 6f desktop.DBus.Pro [ 18.176009] bt_tx: ab57e24f: 52 0c f1 00 00 00 00 00 00 00 00 00 00 00 00 00 R............... [ 18.184737] bt_tx: 28795da0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 18.193409] bt_tx: 666e3c9f: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 18.202054] bt_tx: ae0b1aa8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 18.210771] bt_tx: 505f01b3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 18.210798] bt_tx: 97c0bacf: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 18.210806] bt_tx: 17e7f15f: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 18.210814] bt_tx: 51888947: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 18.210821] bt_tx: b798ca02: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 18.210829] bt_tx: d0b4edaa: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 18.210836] bt_tx: c8367305: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 18.210843] bt_tx: b45a454f: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 18.210850] bt_tx: 40385533: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 18.210858] bt_tx: da405748: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 18.210864] bt_tx: ac3ba4e9: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 18.210871] bt_tx: c348ff56: 00 00 00 00 .... [ 18.210949] tx: 2029befa: 02 00 00 00 f4 00 10 00 56 02 00 00 00 00 00 01 ........V....... [ 18.210963] tx: 58aca42f: 52 0c f1 00 00 00 00 00 00 00 00 00 00 00 00 00 R............... [ 18.254960] bt_tx: af474656: 45 0c 01 02 E... [ 18.255017] tx: 8950ab82: 02 00 00 00 04 00 10 00 6b 00 00 00 00 00 00 01 ........k....... [ 18.255027] tx: 654afe3f: 45 0c 01 02 00 00 00 00 00 00 00 00 00 00 00 00 E............... [ 18.289814] bt_tx: ab57e24f: 58 0c 00 X.. [ 18.289860] tx: 6cefbea7: 02 00 00 00 03 00 0d 00 77 00 00 00 01 58 0c 00 ........w....X.. [ 18.289869] tx: 077cade9: 44 42 75 73 2e 50 72 6f 70 65 72 74 69 65 73 00 DBus.Properties. [ 20.346533] Bluetooth: hci0: command 0x0c58 tx timeout [ 20.351745] Bluetooth: hci0: Opcode 0x0c58 failed: -110 [ 48.447712] esp32_spi: esp_inetaddr_event: ------- IP event -------

sergey-suloev commented 7 months ago

Actually, something changed.. I can't even make it work stable at 1MHz anymore. So things got even worse .. And now it doesn't make any difference if I enable "CS change" or disable it.. the same behavior

mantriyogesh commented 7 months ago

Can you please test the base SPI with ESP-IDF example first. https://github.com/espressif/esp-idf/tree/master/examples/peripherals/spi_slave/receiver

If the SPI with ESP-IDF works, then only i think we can test with ESP-Hosted.

IDF example will test SPI pins and Handshake. If you can send logic dump, we can check.

Also, first check the Wi-Fi, there are some changes done for latest Linux kernel in Bluetooth.

Also, it is possible to test ESP-Hosted-FG first (both sides)? Anyway the transport is similar but the issues are easier to localize in ESP-Hosted-FG.

sergey-suloev commented 7 months ago

Can you please test the base SPI with ESP-IDF example first. https://github.com/espressif/esp-idf/tree/master/examples/peripherals/spi_slave/receiver

This sample requires 2 ESP devices connect each other, how this can help in my case ?

mantriyogesh commented 7 months ago

You can run user space spi application from Linux as spi master?

sergey-suloev commented 7 months ago

You can run user space spi application from Linux as spi master?

What application you are talking about ?

sergey-suloev commented 7 months ago

@mantriyogesh I tested with Rpi3 A+ and it works well up to 8MHz with "cs change" enabled. So it seems to be very much critical about SPI protocol differences between master/slave

mantriyogesh commented 7 months ago

Great that you verified the solution with Raspberry Pi.

  1. Have you tested your branch or the master solution?

  2. Also, ESP32 can work till 10MHz actual clock (note that actual clock some times is smaller than requested clock) for SPI slave.

  3. So at least I can say that the connections are fine? What is the raw throughout you get on transport (rx and tx)? https://github.com/espressif/esp-hosted/blob/master/esp_hosted_ng/docs/Raw_TP_Testing.md

sergey-suloev commented 7 months ago

@mantriyogesh I think I found the problem and I'll try to explain. It seems like the person who wrote the NG host driver didn't fully understand the meaning of spi_transfer->cs_change functionality in linux kernel.

If you look at the kernel code you will see how spi_transfer->cs_change is handled: https://elixir.bootlin.com/linux/v6.8-rc5/source/drivers/spi/spi.c#L1652 When there is only one spi_transfer in spi_message the cs_change flag is not taken into account.

The function spi_sync_transfer , which is used in NG host driver, always creates spi_message with only one spi_transfer in the list, and therefore CS line is never de-asserted in this scenario. I also checked this with logic analyzer.

The problem with sunxi is that the spi-sun6i driver sets the controller in manual mode, i.e. the software must control CS line. https://elixir.bootlin.com/linux/v6.8-rc5/source/drivers/spi/spi-sun6i.c#L367 That's why (I think) using cs_change didn't help me.

finally, I disabled "manual" mode in spi-sun6i driver and now it is 100% stable.

mantriyogesh commented 7 months ago

Oh Okay. I tried to understand this a bit further wrt sun6i, As per A31 User Manual v1.3 May 10.pdf Not sure if this is latest / correct.

If I am not wrong, are you referring to the '8.4.4.3. SPI TRANSFER CONTROL REGISTER' bit(6) Screenshot 2024-02-26 at 11 08 44 AM

I suppose, the intention was to hit (even in case of Raspberry Pi) https://elixir.bootlin.com/linux/v6.0/source/drivers/spi/spi.c#L1506 This CS de-assert->wait->Assert was needed for ESP32 to be done additionally in driver (may be software (Handshake interrupt handling) was late for next spi transfer?) Nevertheless, it is worth to try without 'cs_change' as it was introduced as just workaround. but the software has changed since then a lot.

So the fix at your side is not to set bit(6), As Allwinner might already be doing that in hardware. Just confirming the understanding.

sergey-suloev commented 7 months ago

Oh Okay. I tried to understand this a bit further wrt sun6i, As per A31 User Manual v1.3 May 10.pdf Not sure if this is latest / correct.

If I am not wrong, are you referring to the '8.4.4.3. SPI TRANSFER CONTROL REGISTER' bit(6) Screenshot 2024-02-26 at 11 08 44 AM

I suppose, the intention was to hit (even in case of Raspberry Pi) https://elixir.bootlin.com/linux/v6.0/source/drivers/spi/spi.c#L1506 This CS de-assert->wait->Assert was needed for ESP32 to be done additionally in driver (may be software (Handshake interrupt handling) was late for next spi transfer?) Nevertheless, it is worth to try without 'cs_change' as it was introduced as just workaround. but the software has changed since then a lot.

My point is that the code in NG host driver which sets cs_change to 1 doesn't make any difference if compiled against mainline kernel. Because our spi_transfer object is always "last" in the spi_message list ( we only have one transfer ) , the code will hit the line 1655 and , as a result, CS line will be kept. https://elixir.bootlin.com/linux/v6.8-rc5/source/drivers/spi/spi.c#L1655

So the fix at your side is not to set bit(6), As Allwinner might already be doing that in hardware. Just confirming the understanding.

Yes, when I don't set this bit 6 the connection becomes very stable at every freq

mantriyogesh commented 7 months ago

Oh okay.. thanks for the explanation.. Ideally we can make the code 'cs_change' free.