espressif / esp-hosted

Hosted Solution (Linux/MCU) with ESP32 (Wi-Fi + BT + BLE)
Other
705 stars 168 forks source link

Error while loading esp-hosted-ng #228

Open sergey-suloev opened 1 year ago

sergey-suloev commented 1 year ago

Hello, I have the following error log while trying to make esp32-spi driver work:

Received ESP bootup event [ 14.985448] EVENT: 3 [ 14.987680] EVENT: 2 [ 14.989873] EVENT: 0 [ 14.992135] EVENT: 4 [ 14.994334] EVENT: 1 [ 14.996575] esp32: process_fw_data ESP chipset's last reset cause: [ 14.996584] POWERON_RESET [ 15.005595] esp32: ESP Firmware version: 1.0.2 [ 15.010102] ESP chipset detected [esp32] [ 15.102378] ESP peripheral capabilities: 0xf8 [ 15.135844] dwmac-sun8i 1c30000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx [ 15.144481] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 15.211651] dwmac-sun8i 1c30000.ethernet eth0: Too many address, switching to promiscuous [ 15.311069] ESP Bluetooth init [ 15.321969] Capabilities: 0xf8. Features supported are: [ 15.327285] WLAN on SPI [ 15.330147] BT/BLE [ 15.332603] - HCI over SPI [ 15.335763] - BT/BLE dual mode [ 16.829694] esp_cfg80211_scan [ 18.879041] Bluetooth: hci0: command 0x1001 tx timeout [ 18.879040] Bluetooth: hci0: Opcode 0x1001 failed: -110 [ 21.983063] esp32: Command[4] timed out [ 21.986980] esp32: wait_and_decode_cmd_resp(priv, cmd_node) failure, ret: -22 [ 22.006071] esp_cfg80211_scan [ 27.103076] esp32: Command[4] timed out [ 27.108800] esp32: wait_and_decode_cmd_resp(priv, cmd_node) failure, ret: -22 [ 27.129254] esp_cfg80211_scan [ 27.286672] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this. [ 27.309867] Bridge firewalling registered [ 31.711121] dc1sw: disabling [ 31.714236] cpvdd: disabling [ 31.717313] vcc-1v2-hsic: disabling [ 32.223050] esp32: Command[4] timed out [ 32.226950] esp32: wait_and_decode_cmd_resp(priv, cmd_node) failure, ret: -22 [ 33.043185] esp_cfg80211_scan [ 38.111931] esp32: Command[4] timed out [ 38.117139] esp32: wait_and_decode_cmd_resp(priv, cmd_node) failure, ret: -22 [ 43.049586] esp_cfg80211_scan [ 48.095297] esp32: Command[4] timed out [ 48.100716] esp32: wait_and_decode_cmd_resp(priv, cmd_node) failure, ret: -22 [ 57.045779] esp_cfg80211_scan [ 62.175302] esp32: Command[4] timed out [ 62.179178] esp32: wait_and_decode_cmd_resp(priv, cmd_node) failure, ret: -22

mantriyogesh commented 1 year ago

Are External Pull ups connected?

kapilkedawat commented 1 year ago

Please see this for External Pull ups.. https://docs.espressif.com/projects/esp-idf/zh_CN/latest/esp32/api-reference/peripherals/sd_pullup_requirements.html#sd-pull-up-requirements

sergey-suloev commented 1 year ago

Are External Pull ups connected?

I am using SPI protocol, do I still need pull-ups ?

mantriyogesh commented 1 year ago

Nope For SPI no pull-ups required. for stable CS operation you can apply on CS, but it is not mandatory. Can you first test raw throughput? this will make sure the transport is working fine or not.

mantriyogesh commented 1 year ago

https://github.com/espressif/esp-hosted/issues/224#issuecomment-1521461478

mantriyogesh commented 1 year ago

https://github.com/espressif/esp-hosted/issues/223

sergey-suloev commented 1 year ago

Nope For SPI no pull-ups required. for stable CS operation you can apply on CS, but it is not mandatory. Can you first test raw throughput? this will make sure the transport is working fine or not.

Is there a manual on how to build ESP firmware for "SPI-only" mode ?
I tried to build it "as is" and re-flash but my ESP stopped to send any events after this update. It doesn't seem to build the correct firmware by default.

Mr-Bossman commented 1 year ago

@sergey-suloev the default speed of SPI is 25Mhz if I recall correctly, and 50Mhz for SDIO. It is next to impossible to debug as there are many transmission errors. I ended up setting the speed to 1 MHz in the device-tree overlay for SPI. This fixed my issues. For SDIO I had to make a PCB as the transmission line effects were too great.

If you get Capabilities: 0xf8. Features supported are: printed it means that no software or hardware errors occurred during startup. Any errors after this are almost certainly caused by signal integrity.

sergey-suloev commented 1 year ago

@sergey-suloev the default speed of SPI is 25Mhz if I recall correctly, and 50Mhz for SDIO. It is next to impossible to debug as there are many transmission errors. I ended up setting the speed to 1 MHz in the device-tree overlay for SPI. This fixed my issues. For SDIO I had to make a PCB as the transmission line effects were too great.

I tried to set 2MHz but the firmware seems to update it to 10MHz again

Mr-Bossman commented 1 year ago

@sergey-suloev the default speed of SPI is 25Mhz if I recall correctly, and 50Mhz for SDIO. It is next to impossible to debug as there are many transmission errors. I ended up setting the speed to 1 MHz in the device-tree overlay for SPI. This fixed my issues. For SDIO I had to make a PCB as the transmission line effects were too great.

I tried to set 2MHz but the firmware seems to update it to 10MHz again

Ah yes in the esp diver, it sets the frequency to 25Mhz you need to change it in there too.

https://github.com/espressif/esp-hosted/blob/master/esp_hosted_ng/host/spi/esp_spi.c#L698

and other places in this file it gets set.

sergey-suloev commented 1 year ago

I was only able to process bootup events with SPI mode 0. I have no idea why the driver sets it to 2 - there is no communication at all in mode 2.

Mr-Bossman commented 1 year ago

I was only able to process bootup events with SPI mode 0. I have no idea why the driver sets it to 2 - there is no communication at all in mode 2.

I got it to work in SPI mode 2, so not sure.

Also, I would replace these lines to set it always to 1Mhz or lower. Because they over write the device tree. https://github.com/espressif/esp-hosted/blob/master/esp_hosted_ng/host/spi/esp_spi.c#L683 https://github.com/espressif/esp-hosted/blob/master/esp_hosted_ng/host/spi/esp_spi.c#L513

sergey-suloev commented 1 year ago

@sergey-suloev the default speed of SPI is 25Mhz if I recall correctly, and 50Mhz for SDIO. It is next to impossible to debug as there are many transmission errors. I ended up setting the speed to 1 MHz in the device-tree overlay for SPI. This fixed my issues. For SDIO I had to make a PCB as the transmission line effects were too great.

I tried to set 2MHz but the firmware seems to update it to 10MHz again

Ah yes in the esp diver, it sets the frequency to 25Mhz you need to change it in there too.

https://github.com/espressif/esp-hosted/blob/master/esp_hosted_ng/host/spi/esp_spi.c#L698

and other places in this file it gets set.

yes, this is initial speed. I also tried set it to a lower value in device tree but it is set up again to 10MHz during processing boot-up events.

sergey-suloev commented 1 year ago

I was only able to process bootup events with SPI mode 0. I have no idea why the driver sets it to 2 - there is no communication at all in mode 2.

I got it to work in SPI mode 2, so not sure.

Also, I would replace these lines to set it always to 1Mhz or lower. Because they over write the device tree. https://github.com/espressif/esp-hosted/blob/master/esp_hosted_ng/host/spi/esp_spi.c#L683 https://github.com/espressif/esp-hosted/blob/master/esp_hosted_ng/host/spi/esp_spi.c#L513

Reducing speed to 1MHz didn't change things for me

Received ESP bootup event [ 14.697283] EVENT: 3 [ 14.699492] EVENT: 2 [ 14.701693] EVENT: 0 [ 14.703893] EVENT: 4 [ 14.706136] EVENT: 1 [ 14.708342] esp32: process_fw_data ESP chipset's last reset cause: [ 14.708353] POWERON_RESET [ 14.717711] esp32: ESP Firmware version: 1.0.2 [ 14.722164] ESP chipset detected [esp32] [ 14.992296] ESP peripheral capabilities: 0xf8 [ 15.215547] ESP Bluetooth init [ 15.219124] Capabilities: 0xf8. Features supported are: [ 15.224383] WLAN on SPI [ 15.227280] BT/BLE [ 15.229671] - HCI over SPI [ 15.232740] - BT/BLE dual mode [ 17.445562] Bluetooth: hci0: command 0x1002 tx timeout [ 17.450804] Bluetooth: hci0: Opcode 0x1002 failed: -110

sergey-suloev commented 1 year ago

I would really appreciate if you guys explain how to build correctly the "SPI-only" firmware.

Mr-Bossman commented 1 year ago

@sergey-suloev creating docs and fixing major bugs is @mantriyogesh decision. I have tried adding device tree support as well as other fixes, but the review and merge time is abysmal. Also, the project has numerous fatal flaws and bad code, and the firmware isn't open source, unfortunately.

sergey-suloev commented 1 year ago

@mantriyogesh could you explain, please, how I can rebuild the "SPI-only" firmware to allow the raw throughput test ?

mantriyogesh commented 1 year ago

@Mr-Bossman Thanks for the inputs. I hope your issues have resolved. Regarding the PR, I see there is no open PR for device tree. In the PR, huge code change was present, where open comments were present on https://github.com/espressif/esp-hosted/pull/183.

To merge the PR it takes time as we need to review, test ourself and then merge. Plus we are short on resources right now.

mantriyogesh commented 1 year ago

https://github.com/espressif/esp-hosted/issues/228#issuecomment-1542757318

@sergey-suloev actually the FG code and NG code has exactly same transport. Can you please test the raw throughput on FG as I mentioned in the issue comments tagged above?

Although, I think before doing raw throughput test, Can you please add logs in:

  1. https://github.com/espressif/esp-hosted/blob/cf871bbaef0906b34812a1e920c745fe101d855e/esp_hosted_ng/host/spi/esp_spi.c#L67
  2. https://github.com/espressif/esp-hosted/blob/cf871bbaef0906b34812a1e920c745fe101d855e/esp_hosted_ng/host/spi/esp_spi.c#L76

if these pins are correctly configured.

With initial event you had verified MISO, CLK and CS pins. MOSI, DataReady, Handhsake needs to be verified. Also please comment https://github.com/espressif/esp-hosted/blob/cf871bbaef0906b34812a1e920c745fe101d855e/esp_hosted_ng/host/spi/esp_spi.c#L214-L217

and set initial frequency to 1MHz at

https://github.com/espressif/esp-hosted/blob/cf871bbaef0906b34812a1e920c745fe101d855e/esp_hosted_ng/host/spi/esp_spi.c#L29

sergey-suloev commented 1 year ago

@mantriyogesh I verified both irq handlers by adding simple logs, they both appear to work correctly Changing SPI clock doesn't do any effect on my system, the error persists.

mantriyogesh commented 1 year ago

Did you try commenting adjust_spi_clock() said above?

Also did you check porting_guide? try removing cs_change.

Do you have logic analyser with you? There are many instances that host doesn't correctly release CLK or CS, but can be only verified with the graphs.

By the way, SPI mode 0 is not supported with ESP32 for ESP-Hosted as it doesn't use DMA. you can try other mode at both sides, spi mode 1/3

sergey-suloev commented 1 year ago

Did you try commenting adjust_spi_clock() said above?

Also did you check porting_guide? try removing cs_change.

Do you have logic analyser with you? There are many instances that host doesn't correctly release CLK or CS, but can be only verified with the graphs.

By the way, SPI mode 0 is not supported with ESP32 for ESP-Hosted as it doesn't use DMA. you can try other mode at both sides, spi mode 1/3

Yes, I tried 1MHz, 2MHz, 4MHz - no effect. I disabled adjusting SPI clock from ESP side.

I also tried remove cs_change - it became even worse: much more timeouts appeared.

Yes, I have a simple copy of saleae analyizer, will see if I can capture signals.

As for the modes, I am using ESP32 but I was only able to get any SPI communication with mode 0. Any other combinations didn't work completely. It was stuck from the beginning because the whole data buffers received from ESP were all zeroed and the code was not able to recognize ESP data header in it. It is probably because ESP firmware is using mode 0 and I can't modify firmware, I am using the released "SPI-only" firmware.

mantriyogesh commented 1 year ago

All ESP firmware in the release binaries are using mode 2.

Anyway, the NG ESP code is also getting pushed in a day or two, so you can manually build the flash the ESP32 with your needed spi mode (or any further code changes). Can you please wait till tomorrow or Monday?

sergey-suloev commented 1 year ago

All ESP firmware in the release binaries are using mode 2.

Anyway, the NG ESP code is also getting pushed in a day or two, so you can manually build the flash the ESP32 with your needed spi mode (or any further code changes). Can you please wait till tomorrow or Monday?

okay, thanks a lot. btw can you share this code earlier or it is important to put it to repo first ?

mantriyogesh commented 1 year ago

NG ESP code is shared in https://github.com/espressif/esp-hosted/tree/master/esp_hosted_ng/esp/esp_driver/network_adapter

sergey-suloev commented 1 year ago

I have captured all lines with Logic v1 app, for me it all seems to be OK. SPI was in mode 2, clock 1MHz . The SPI buffer after exchange on the host side contains zero data and therefore can't be parsed. I have no idea what it means. It maybe a kernel code issue. 12 MHz, 1 B Samples [4].zip

I will try to build a new firmware with SPI mode 0 at ESP side and test again.

mantriyogesh commented 1 year ago

https://github.com/espressif/esp-hosted/issues/228#issuecomment-1543801272 Please refrain from using spi mode 0. You can use other modes.

mantriyogesh commented 1 year ago

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/peripherals/spi_slave.html#speed-and-timing-considerations

sergey-suloev commented 1 year ago

@mantriyogesh I can't build firmware because of a lot of compilation errors

idf_py_stdout_output_227924.zip

kapilkedawat commented 1 year ago

Hi @sergey-suloev , are you using the steps mentioned in https://github.com/espressif/esp-hosted/blob/master/esp_hosted_ng/esp/esp_driver/README.md ? from the logs, its looks like your IDF is not on commit 91dc99a

Please note that cmake . checks out the IDF at a particular commit which is compatible with the application.

sergey-suloev commented 1 year ago

Hi @sergey-suloev , are you using the steps mentioned in https://github.com/espressif/esp-hosted/blob/master/esp_hosted_ng/esp/esp_driver/README.md ? from the logs, its looks like your IDF is not on commit 91dc99a

Please note that cmake . checks out the IDF at a particular commit which is compatible with the application.

I resolved my issues. Actually, there was 2 issues: 1) I already had esp-idf v5 installed in my ~/.espressif folder. 2) esp_hosted_ng/esp/esp_driver/CMakeLists.txt has a bug at line 30 - the path to install.sh is broken

sergey-suloev commented 1 year ago

I don't know how to explain but I made it work only once(!!!) with SPI mode 1 on ESP side and SPI mode 0 at host side.

Received ESP bootup event [ 14.881275] EVENT: 3 [ 14.881297] EVENT: 2 [ 14.881302] EVENT: 0 [ 14.881305] EVENT: 4 [ 14.890259] ESP peripheral test capabilities: 0x0 [ 14.890273] esp32: stop raw throuput test if running [ 14.890278] EVENT: 1 [ 14.890282] esp32: process_fw_data ESP chipset's last reset cause: [ 14.890289] POWERON_RESET [ 14.890292] esp32: ESP Firmware version: 1.0.2 [ 14.890297] ESP chipset detected [esp32] [ 15.137647] ESP peripheral capabilities: 0xf8 [ 15.355920] ESP Bluetooth init [ 15.359530] Capabilities: 0xf8. Features supported are: [ 15.364795] WLAN on SPI [ 15.367671] BT/BLE [ 15.370057] - HCI over SPI [ 15.373159] - BT/BLE dual mode [ 17.268776] Bluetooth: MGMT ver 1.22


root@orangepi-one-bullseye:~# hciconfig hci0: Type: Primary Bus: SPI BD Address: 3C:E9:0E:86:A9:62 ACL MTU: 1021:9 SCO MTU: 255:4 UP RUNNING RX bytes:733 acl:0 sco:0 events:102 errors:0 TX bytes:2182 acl:0 sco:0 commands:0 errors:0


[bluetooth]# show 3C:E9:0E:86:A9:62
Controller 3C:E9:0E:86:A9:62 (public) Name: orangepi-one-bullseye Alias: orangepi-one-bullseye Class: 0x00000000 Powered: yes Discoverable: no DiscoverableTimeout: 0x000000b4 Pairable: yes UUID: Generic Attribute Profile (00001801-0000-1000-8000-00805f9b34fb) UUID: Generic Access Profile (00001800-0000-1000-8000-00805f9b34fb) UUID: PnP Information (00001200-0000-1000-8000-00805f9b34fb) UUID: A/V Remote Control Target (0000110c-0000-1000-8000-00805f9b34fb) UUID: A/V Remote Control (0000110e-0000-1000-8000-00805f9b34fb) UUID: Device Information (0000180a-0000-1000-8000-00805f9b34fb) Modalias: usb:v1D6Bp0246d0537 Discovering: no Roles: central Roles: peripheral Advertising Features: ActiveInstances: 0x00 (0) SupportedInstances: 0x05 (5) SupportedIncludes: tx-power SupportedIncludes: appearance SupportedIncludes: local-name

I can't repeat this again. Still timeouts.. I am 99% sure this is a problem in the host's SPI driver.

mantriyogesh commented 1 year ago

Hello @sergey-suloev

It is possible that mode mismatch some times work because of spi timings interoperability in between two devices. Although when such spi issues, better to assess host spi clock timings and cs transitions.

But it is last suggestion we give to do spi mode mismatch, when all other possible logical areasons are ruled out.

https://github.com/espressif/esp-hosted/issues/200#issuecomment-1430692728

Although it is important to check if the timing is consistent enough.

mantriyogesh commented 1 year ago

@sergey-suloev any updates further?

sergey-suloev commented 1 year ago

@sergey-suloev there is nothing I can do because the driver needs update for the chip. I took everything what I found in fg and applied to ng but it still doesn't work . Let's wait until you finish your work .wanna close it ?

mantriyogesh commented 1 year ago

Can you please explain what is working and what is not? Also can you send logs ?

sergey-suloev commented 1 year ago

Can you please explain what is working and what is not? Also can you send logs ?

Looks like there are hardware incompatibilities between esp and allwinner. Probably , clock issues . I talked to a kernel SPI driver maintainer, they don't think it is a driver issue. I use 10cm Dupont cables for SPI connections and do I don't think it is a connection problem either. I am going to look more with logic analyzer. The timeout issues start after initial exchange , which is fine . Timeouts appear when host and esp start exchanging HCI commands.

mantriyogesh commented 1 year ago

Yes. Anyway, we should look that the spi timings are matched with trying different spi modes say spi 1/2/3. If same nmode did not work, try with mode mismatch (possible that transitions are late), in accordance to the timing. If possible, try to test with the logic analyzer.

Once you get working config at both places, you can try multiple times running raw throughput test, for both Rx and Tx.

This host slave compatibility issues are observed on some Linux platform. Logic analyzer will give clear picture.

sergey-suloev commented 1 year ago

Yes. Anyway, we should look that the spi timings are matched with trying different spi modes say spi 1/2/3. If same nmode did not work, try with mode mismatch (possible that transitions are late), in accordance to the timing. If possible, try to test with the logic analyzer.

Once you get working config at both places, you can try multiple times running raw throughput test, for both Rx and Tx.

This host slave compatibility issues are observed on some Linux platform. Logic analyzer will give clear picture.

Issues happen in all modes , 1 2 and 3. Mode 0 is not in use anymore , as you recommended. Mode mismatch gives even worse results . I think this is all about the esp chip quality . Many other hardware work well with the same host .

mantriyogesh commented 1 year ago

Mode 0 not to be used at esp (https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/peripherals/spi_slave.html#restrictions-and-known-issues).

If you try mode mismatch, any spi mode can be used at host. Important base understanding is to find which combination works with correct timing.

Logic analyzer will help you.

I cannot stop you thinking about chip quality, as you have freedom of speech. but I think any arguments should be justified with the proof. I hope you can find the logic analyzer and let yourself get help.

sergey-suloev commented 1 year ago

Mode 0 not to be used at esp (https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/peripherals/spi_slave.html#restrictions-and-known-issues).

If you try mode mismatch, any spi mode can be used at host. Important base understanding is to find which combination works with correct timing.

Logic analyzer will help you.

I cannot stop you thinking about chip quality, as you have freedom of speech. but I think any arguments should be justified with the proof. I hope you can find the logic analyzer and let yourself get help.

When I said about chip quality I meant the possibility that I got a fake one. There is a lot of fake chips on market