espressif / esp-hosted

Hosted Solution (Linux/MCU) with ESP32 (Wi-Fi + BT + BLE)
Other
663 stars 150 forks source link

ESP-Hosted-FG: wifi ap packet loss issue #289

Open qjy0129 opened 9 months ago

qjy0129 commented 9 months ago

ESP32 receives network messages from the main control through SDIO and sends them through esp_wifi_internel_tx(ESP_IF_WIFI_AP, payload, payload_len) sends messages, and the content of payload is printed out through logs. However, when a wireless device uses Wireshark to capture packets, it is found that the received messages are fewer than those sent by ESP32 wifi. Why does the interface esp_wifi_internel_tx have a probability that messages are not sent properly?

mantriyogesh commented 9 months ago

Hello @qjy0129 ,

The overTheAir captures or sniffers may drop the packets and sniffer is expected to work for whole channel. You can lower the speed and capture in less noisy or private environment.

Also, please bear in mind, Wireless medium heavily depends upon the signal strength, contention, noise level, saturation etc. This is true in any wireless communication. Packet rate will be limited on: transport capacity (and hence the SPI/SDIO clock), Wi-Fi transmit and reception capacity and sensitivity.

Let us know if you have any drop seen on ESP-Hosted as such (Host to slave Or other way round), which is basically transport (and wired). If you can share the specifics test reports and specific numbers, we can check ahead.

qjy0129 commented 9 months ago

@mantriyogesh

connect

The physical connection is shown in the figure. The laptop is wirelessly connected to ESP32 and dynamically obtains the IP address. When using Wireshark to capture the DHCP message on the laptop, it is found that there is an issue with protocol interaction.

The dhcp ack message from STM32 is sent to ESP32 through SDIO, and ESP32 successfully obtains the dhcp message. However, ESP32 sends it through the interface esp_wifi_internel_tx and the packet capture tool on the laptop does not catch it

ESP32 calls esp_wifi_internel_tx interface code:

host_fg

esp log: esp32.log

in esp32.log sdio_receive hexadecimal report printed through Wireshark analysis: esp32

DHCP related messages captured on the laptop: pc

Through packet capture comparison, the laptop did not receive the DHCP ack message sent by ESP32 esp_wifi_internel_tx

mantriyogesh commented 9 months ago

Sorry I could not check this today, Will test this and get back in a couple of days. By the time, Can you also enable or add the prints over transport both sides and at line 223: https://github.com/espressif/esp-hosted/blob/ce3c50a33fa4bc562a1b6cbcee292c1ae0b0a404/esp_hosted_fg/esp/esp_driver/network_adapter/main/app_main.c#L223

if you get response but dropped in memcmp below?

qjy0129 commented 9 months ago

Sorry I could not check this today, Will test this and get back in a couple of days. By the time, Can you also enable or add the prints over transport both sides and at line 223:

https://github.com/espressif/esp-hosted/blob/ce3c50a33fa4bc562a1b6cbcee292c1ae0b0a404/esp_hosted_fg/esp/esp_driver/network_adapter/main/app_main.c#L223

if you get response but dropped in memcmp below?

add print, get response not dropped in memcmp below。

There is no problem with the SDIO transmission between ESP32 and STM32, and the messages sent by the laptop can also be received by ESP32. However, ESP32 sends the messages to the laptop through the interface esp_wifi_internel_tx, and the corresponding messages are not captured by the laptop The interface esp_wifi_internel_tx add print return value, the result has always been ESP_OK,

mantriyogesh commented 9 months ago

Instead of capturing packets on laptop interface, Can you use some other station and get full sniffer logs (on same channel you had configured the softap) using laptop?

mantriyogesh commented 9 months ago

Also, do you see response ? Is it DHCP ack in https://github.com/espressif/esp-hosted/issues/289#issuecomment-1829868874

qjy0129 commented 9 months ago

Instead of capturing packets on laptop interface, Can you use some other station and get full sniffer logs (on same channel you had configured the softap) using laptop?

I also had this issue when testing with a mobile phone instead of a laptop

Also, do you see response ? Is it DHCP ack in #289 (comment)

I see DHCP ACK response

mantriyogesh commented 9 months ago

Oh okay.

Can you please send sdkconfig used, end output of command, esptool.py flash_id

For example,

$ esptool.py flash_id
esptool.py v4.7.dev3
Found 3 serial ports
Serial port /dev/cu.usbserial-130
Connecting......
Detecting chip type... Unsupported detection protocol, switching and trying again...
Connecting....
Detecting chip type... ESP32
Chip is ESP32-D0WD-V3 (revision v3.0)
Features: WiFi, BT, Dual Core, 240MHz, VRef calibration in efuse, Coding Scheme None
Crystal is 40MHz
MAC: e8:31:cd:c4:87:f8
Uploading stub...
Running stub...
Stub running...
Manufacturer: 20
Device: 4016
Detected flash size: 4MB
Hard resetting via RTS pin...

I want to check what crystal frequency you use.

mantriyogesh commented 9 months ago

If crystal is fine, we still would need esp verbose logs along with wpa logs enabled, along with host logs and sniffer capture which need to be started before the connection.

I will add details how esp logs can be collected.

mantriyogesh commented 9 months ago

while testing, please enable these logs at ESP,

  1. export idf environment such that idf.py is visible (like . export.sh)
  2. $ cd esp_hosted_fg/esp/esp_driver/network_adapter/
  3. Enable the ESP side logging: $ idf.py menuconfig Screenshot 2023-11-29 at 11 16 39 AM Screenshot 2023-11-29 at 11 15 18 AM

and do a Wi-Fi sniffer capture for that channel. Make sure that the sniffer is started before connecting to AP.

qjy0129 commented 8 months ago

esptool.py v4.7.dev2 Found 2 serial ports Serial port /dev/ttyUSB_esp32 Connecting................... Detecting chip type... Unsupported detection protocol, switching and trying again... Connecting... Detecting chip type... ESP32 Chip is ESP32-D0WDR2-V3-V3 (revision v3.0) Features: WiFi, BT, Dual Core, 240MHz, VRef calibration in efuse, Coding Scheme None Crystal is 40MHz MAC: 54:43:b2:6e:cd:30 Uploading stub... Running stub... Stub running... Manufacturer: c8 Device: 4016 Detected flash size: 4MB Hard resetting via RTS pin...

sdkconfig.txt

Printing can only be debugged. If open verbose eps32 system will constantly restart。 Printing has been added, and the problem has become more serious。

laptop capture: laptop.txt Changing laptop.txt to laptop.pcap can be opened using Wireshark laptop

esp32log: esp32.log Convert the hexadecimal sent in the log of esp32 into network messages through tools esp32.txt Changing esp32.txt to esp32.pcap can be opened using Wireshark esp32

The memory usage of the compiled bin may be helpful for analyzing problems Total sizes: Used static DRAM: 84160 bytes ( 40420 remain, 67.6% used) .data size: 21184 bytes .bss size: 62976 bytes Used static IRAM: 118778 bytes ( 12294 remain, 90.6% used) .text size: 117751 bytes .vectors size: 1027 bytes Used Flash size : 1273567 bytes .text : 966095 bytes .rodata : 307216 bytes Total image size: 1413529 bytes (.bin may be padded larger)

mantriyogesh commented 8 months ago

Printing can only be debugged. If open verbose eps32 system will constantly restart。

Thanks for adding / removing logs at expected places and debug logs. You can also add a log in Rx similar to Tx in wlan_ap_rx_callback().

Instead of captures taken at ESP and laptop etc separately, can you take a sniffer capture from start (with EAPOL M1->M4 captured)? It should actually be simpler than taking tcpdump at multiple places.

mantriyogesh commented 8 months ago

Obviously, the messages of DHCP should just transparently be passed to network interface or Wi-Fi driver by ESP-Hosted. But DHCP layer, as such, is actually out of scope for ESP-Hosted. If you have some issues in DHCP server, we cannot verify.

Also, Can you please check assign static address manually and the ping work from one station to other station connected to same AP? Two stations (if mobile phones, go in settings and assign static IP manually) with static IP communicate together?

Please note, For station to station communication, please comment the memcmp lines from earlier discussion.

qjy0129 commented 8 months ago

We have already captured DHCP offer or ack messages on ESP32, indicating that there is no problem with the DHCP server. In addition, our device also has a wired Ethernet interface, so there is no problem with assigning addresses. Just using the dhcp protocol to illustrate that your ESP32 wifi interface esp_wifi_internel_tx will lose packets

mantriyogesh commented 8 months ago

We will verify the DHCP server with the softAP using Linux host. Anyway, the issue you face is on ESP32 side. so, Linux test using dhcp server should be okay.

We will capture logs (including sniffer) at our side and test your scenario. We will also add the logs in ESP. If you need any specific log format, let us know the patch. else, will add logs which will print hexdump, till ping with softAP is successful.

qjy0129 commented 8 months ago

esp-idf/examples/wifi/wps_softap_registrar The official provided use cases also have this issue

Repeatedly connecting to the EPS32 hotspot with a laptop will reproduce the problem

mantriyogesh commented 8 months ago

if you see this getting reproduced on ESP-IDF also, you can directly raise this issue at ESP-IDF issues. We rely on ESP-IDF Wi-Fi as base to work on.

We will also check internally, but creating issue at ESP-IDF would get faster response.

SohKamYung-Espressif commented 8 months ago

@qjy0129 We have tested the DHCP server with the ESP32 as Hosted softAP using Linux host and did not encounter issues.

Our hardware setup:

To capture packets as seen by the ESP32 softAp, debug code was added to slave_control.c. See this diff file diff.txt

SoftAP setup on Raspberry Pi with ESP32:

  1. SoftAp: build and run linux driver as per Hosted instructions.

  2. start ESP32 as softAP. For purposes of testing, the SSID is set to "testsky" with a password of "testskypassword", using channel 1 and bandwidth of HT20. For linux, this was set by updating softap mode settings in ctrl_config.h. See this diff file ctrl_config.txt

  3. used ifconfig to verify that ethap0 was present. Set it's IP address with sudo ifconfig ethap0 192.168.4.1.

  4. start a dhcp server: sudo dnsmasq --no-daemon --no-resolv --no-poll --dhcp-script=/system/bin/dhcp_announce --dhcp-range=192.168.4.2,192.168.4.20,1h as documented in https://github.com/espressif/esp-hosted/blob/master/esp_hosted_fg/docs/Linux_based_host/Getting_started.md#212-wi-fi-softap-mode-operations

Running the test:

  1. Connect a ESP32 running as station to the softAP. The mac address of this station was set to aa:bb:cc:dd:ee to distinguish between the two. The station got an IP address of 192.168.4.16 from the DHCP server.
  2. From the Raspberry Pi, ping the station.

Attachements:

Log file from the ESP32 running as softAP, from the moment the station connects to it, followed by the various pings, to cross-reference against the pcap capture (see below) log.network_adapter.20231205141237.txt

pcap file captured showing the traffic between the softAP and the station. You can use Wireshark to open the pcap file and set it to decrypte the data by following the steps here https://wiki.wireshark.org/HowToDecrypt802.11

You can use the display filter to see the dhcp and icmp (ping) traffic. I have attached screen shots from the pcap:

DHCP: image

ICMP (Ping): image

qjy0129 commented 8 months ago

1701763965196

Four consecutive requests indicate that an ack packet has been dropped in the middle, this is the question I raised.

You can use your phone to open a hotspot and use your laptop to obtain an address, so that there will be no duplicate request messages when capturing packets

mantriyogesh commented 8 months ago

Let us know once you raise the issue at ESP-IDF..

qjy0129 commented 8 months ago

I raise the issue at ESP-IDF: https://github.com/espressif/esp-idf/issues/12725

qjy0129 commented 8 months ago

Is there still someone in your company investigating this issue? I haven't heard any news for a long time.

mantriyogesh commented 8 months ago

@qjy0129 sorry let me check this with someone in Wi-Fi team for you.

mantriyogesh commented 8 months ago

@kapilkedawat