espressif / esp-hosted

Hosted Solution (Linux/MCU) with ESP32 (Wi-Fi + BT + BLE)
Other
630 stars 145 forks source link

esp-hosted-fg for esp32c2, spi master on linux os, using wget to download for a while, become very low #410

Open WindWard-atm opened 3 weeks ago

WindWard-atm commented 3 weeks ago

wget download become very low suddenly,and cannot recover again

version: base on commit d29c683468634c4bb87d8bf2df9023fd96e21630 linux version: 5.10.186

WindWard-atm commented 3 weeks ago

004 001 002 003

mantriyogesh commented 3 weeks ago

Please enable spi checksum at slave and test raw transport throughout first in rx and tx direction. This would let us know if any issues at lower layer.

mantriyogesh commented 3 weeks ago

https://github.com/espressif/esp-hosted/blob/master/esp_hosted_fg/docs/Linux_based_host/Raw_TP_Testing.md

WindWard-atm commented 3 weeks ago

test raw transport result 0005 0006

WindWard-atm commented 3 weeks ago

esp to host and host to esp are all nomal

mantriyogesh commented 3 weeks ago

Yes this looks just alright. Thanks for the quick test..

No issues in raw throughout I see. What was the SPI Freq used btw?

Generally interference could cause this kind of behaviour, can you please to try this same test in isolated environment or on different AP which is close enough (not too close!)?

2.4GHz contains/allows thread, wifi, bluetooth,microwave traffic. If your station or ap is too close or very crowded in these traffic the packets may start dropping over the air from slave wifi end to ap or vice versa.

There is one change I am yet to push, but i have only done changes for sdio yet. There was possible one packet unexpectedly dropped when the network driver (netdev) queue was pause-resumed.

While I am not sure if you can port it for spi, I still attached the change. You can either try it wait till I get this for spi transport.

Please note the base commit is also present in patch, over which the patch would work.

452.zip

WindWard-atm commented 3 weeks ago

Thanks for your reply spi freq we used is 30Mhz only wifi ,no BT we had test on different AP and different distance, result is same will try the 452.zip as soon as possible

WindWard-atm commented 3 weeks ago

I cannot transplant this patch, 452.zip is sdio. I have tried to transplant it, but some of them cannot find the corresponding position of the code

do you have another way to troubleshoot this problem

WindWard-atm commented 3 weeks ago

We found a different phenomenon, which also occurs when large data is downloaded ping IP address is normal,But cannot communicate anymode. It can only be recovered by reconnecting to the network

20240622164151 tcpdump.txt

mantriyogesh commented 3 weeks ago

I cannot transplant this patch, 452.zip is sdio. I have tried to transplant it, but some of them cannot find the corresponding position of the code

do you have another way to troubleshoot this problem

I would need to port it for SPI, which might take some time. I will do it and let you know.

Can you please confirm, Are you using jumper cables for 30MHz? Can you please send close-up setup picture ?

WindWard-atm commented 3 weeks ago

We have confirm many times ,the spi speed is 30Mhz

what about the above issue ? "We found a different phenomenon, which also occurs when large data is downloaded ping IP address is normal,But cannot communicate anymode. It can only be recovered by reconnecting to the network"

mantriyogesh commented 3 weeks ago

We have confirm many times ,the spi speed is 30Mhz

Question asked was if you use jumper cables. If you use high frequency on jumper wires, it may fail.

We found a different phenomenon, which also occurs when large data is downloaded ping IP address is normal,But cannot communicate anymode. It can only be recovered by reconnecting to the network

From tcpdump, we did not find any issue apart from suspecting that the data is incorrect. rest tcpdump looks just fine. to understand data is incorrect, we need to understand hardware connections you made.

mantriyogesh commented 3 weeks ago

Can you please send close-up setup picture ?

Also, if you comment, https://github.com/espressif/esp-hosted/blob/b1422afc4fe0a2ea1ffd6b5459edddc4d968ee71/esp_hosted_fg/host/linux/host_driver/esp32/spi/esp_spi.c#L206

does it help?

WindWard-atm commented 3 weeks ago

we do not have any jumper cables, esp32c2 is used on our prodcuts for mass production

20240624191005

new two issues: 1、esp32c2,when BT is turned off, what is the maximum adjustment to CONFIG_ESP_SPI_RX_Q_SIZE/CONFIG_ESP_SPI_TX_Q_SIZE? 2、What is the reason why the network speed is getting slower and slower through wget's endless cycle of downloading programs? Memory fragmentation? Hot chip?

mantriyogesh commented 3 weeks ago
  1. With bt disabled, you can remove line: https://github.com/espressif/esp-hosted/blob/b1422afc4fe0a2ea1ffd6b5459edddc4d968ee71/esp_hosted_fg/esp/esp_driver/network_adapter/sdkconfig.defaults.esp32c2#L30-L31 to use default value 10 for both. Optionally, you can disable bluetooth in this file by setting, https://github.com/espressif/esp-hosted/blob/b1422afc4fe0a2ea1ffd6b5459edddc4d968ee71/esp_hosted_fg/esp/esp_driver/network_adapter/sdkconfig.defaults.esp32c2#L7 to CONFIG_BT_ENABLED=n

  2. This is not observed in our testing. But you are blocked on this. It would be worth to see if the packets are getting dropped at any place, which cause this kind of behaviour. \ I had added packet number inside header few days back, to assess if any packet is dropped in either direction. DBG__pkt_num_over_b1422afc4fe0a2ea1ffd6b5459edddc4d968ee71.patch.gz \ :warning: Please note: header is changed in this patch. This is only debug patch, not suitable for production. \ You can ignore ota related changes in this patch. Please apply patch both sides and build and flash both sides.

zhaojinyun commented 3 weeks ago

Hi mantriyogesh

Use iperf testing. When the AP was just connected for testing, everything was normal, but after a while, a random speed of 0 would appear. Continue testing, and after a period of time, the speed will remain at 0 and cannot be restored. After reconnecting the AP for testing, it returned to normal, but after testing for a period of time, the previous phenomenon will be repeated.

[ 5] 76.01-77.00 sec 1.62 MBytes 13.7 Mbits/sec [ 5] 77.00-78.00 sec 1.50 MBytes 12.6 Mbits/sec [ 5] 78.00-79.01 sec 1.62 MBytes 13.6 Mbits/sec [ 5] 79.01-80.00 sec 1.62 MBytes 13.7 Mbits/sec [ 5] 80.00-81.01 sec 1.62 MBytes 13.5 Mbits/sec [ 5] 81.01-82.00 sec 1.62 MBytes 13.8 Mbits/sec [ 5] 82.00-83.01 sec 1.62 MBytes 13.5 Mbits/sec [ 5] 83.01-84.01 sec 1.75 MBytes 14.7 Mbits/sec [ 5] 84.01-85.01 sec 1.62 MBytes 13.7 Mbits/sec [ 5] 85.01-86.00 sec 1.12 MBytes 9.48 Mbits/sec [ 5] 86.00-87.01 sec 0.00 Bytes 0.00 bits/sec [ 5] 87.01-88.01 sec 0.00 Bytes 0.00 bits/sec [ 5] 88.01-89.01 sec 0.00 Bytes 0.00 bits/sec [ 5] 89.01-90.01 sec 0.00 Bytes 0.00 bits/sec [ 5] 90.01-91.01 sec 0.00 Bytes 0.00 bits/sec [ 5] 91.01-92.01 sec 0.00 Bytes 0.00 bits/sec [ 5] 92.01-93.02 sec 384 KBytes 3.13 Mbits/sec [ 5] 93.02-94.00 sec 768 KBytes 6.38 Mbits/sec [ 5] 94.00-95.01 sec 1.75 MBytes 14.5 Mbits/sec [ 5] 95.01-96.01 sec 1.62 MBytes 13.7 Mbits/sec [ 5] 96.01-97.01 sec 1.50 MBytes 12.6 Mbits/se

zhaojinyun commented 3 weeks ago

image This is our circuit diagram. Can you provide us with a C2 SPI firmware that you have tested normally for us to verify?

mantriyogesh commented 2 weeks ago

Can you please confirm the pin connections for Handshake and Data ready ending into C2? In general, we follow, https://github.com/espressif/esp-hosted/blob/master/esp_hosted_fg/docs/Linux_based_host/SPI_setup.md#222-source-compilation

Pin connections in firmware would be https://github.com/espressif/esp-hosted/blob/master/esp_hosted_fg/docs/Linux_based_host/SPI_setup.md#111-pin-connections

unless you need any changes..

mantriyogesh commented 2 weeks ago

Anyway it would not differ much, did you try the debug patch I sent at both sides?

mantriyogesh commented 2 weeks ago

any updates?

zhaojinyun commented 2 weeks ago

@mantriyogesh The patch is being merged for testing. The main issue now is that after testing iper3 for a period of time, for example, after 30 minutes, the network will no longer be able to communicate, but the STA status is still connected. After manually reconnecting the AP, the network will return to normal.

Do you have any suggestions for this?

mantriyogesh commented 2 weeks ago

Is there any disconnect happened in the ESP logs?

In ESP-Hosted-FG (Linux), there is 'sta-diconnected' event supported, which also can send notification at host to reconnect. API: set_event_callback() with CTRL_EVENT_STATION_DISCONNECT_FROM_AP and nun -null event_cb function.

zhaojinyun commented 2 weeks ago
  1. There is no log about AP disconnection in the ESP log.
  2. LINUX also did not receive any STA disconnection events.
  3. LINUX actively sending commands to read the STA status (get_sta_config), which also returns that it is in a connected state.
  4. On the AP side, STA is also in a connected state. 5,When force disconnecting STA on the AP side, LINUX/ESP can both receive disconnection events.
zhaojinyun commented 2 weeks ago

Small data communication is normal, only big data communication will have this problem, such as using iperf pressure testing or using wget to loop and download a large file (10MB)

zhaojinyun commented 2 weeks ago

any update?

mantriyogesh commented 2 weeks ago

Logs at both places would get us some clue. Basically there are two issues, I will try to reproduce both by tomorrow and share our finding..

mantriyogesh commented 2 weeks ago

https://github.com/espressif/esp-hosted/issues/410#issuecomment-2199222284

This although seems extremely strange. Reason being, you are already able to run serial commands, so no issues at bus.

You do not receive any disconnect, also ap has ESP connected and vice versa, so it's little grey..

If specific ways to reproduce easily? Do you see any checksum error in logs either sides?

zhaojinyun commented 2 weeks ago

No checksum error was seen in the log. Moreover, reconnecting the AP can restore normal operation, indicating that there is no problem with communication between LINUX and ESP.

Execute this command on the LINUX side: iperf3 -s -i 1 Execute this command on the PC iperf3 -c 192.168.137.82 -b 100m -t 14400

mantriyogesh commented 2 weeks ago
  1. I think it is worth to test raw throughout first for longer duration.
  1. Also, if you have logic analyser when this issue happens, would be easy to debug.

  2. Did you get chance to check https://github.com/espressif/esp-hosted/issues/410#issuecomment-2186402841. There is debug patch, which might have help to understand if any packet was missed, specially when you ran into iperf issue.