espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.89k stars 7.32k forks source link

Double interface can fail DNS resolution: dns_clear_servers called twice (IDFGH-4440) #6270

Open arnoutdekimo opened 3 years ago

arnoutdekimo commented 3 years ago

Hi,

In the v4.2 but also in the 4.3-dev, the https_request_example allows in the sdkconfig to enable both ethernet and wifi. (CONFIG_EXAMPLE_CONNECT_ETHERNET and CONFIG_EXAMPLE_CONNECT_WIFI) Both will then run a dhcp client and the example will wait until both interfaces are up and have an IP.

However, in the case that one of the two interfaces does not get a DNS server through its DHCP lease, it is possible for no DNS servers to be configured at all, causing all subsequent name resolutions to fail.

What seems to happen is that esp_netif_dhcpc_start_api is called for every interface. In this function all dns servers are cleared. If the slower interface does not restore another dns server, the system will be without any dns server, which is clearly undesired.

(Tested on ESP32-WROOM-32)

Kind regards, Arnout

AxelLin commented 3 years ago

This is a common issue for dual-interface use cases. e.g. In STA + PPP case, when PPP is down the dns server settings are clear. i.e. STA then fail to do DNS resolution.

The problem is lwIP does not support DNS per netif. There is a patch available to support dns servers per netif, see https://savannah.nongnu.org/bugs/?func=detailitem&item_id=58571#options

Maybe esp-lwip can test and pickup this patch.

@david-cermak

david-cermak commented 3 years ago

@arnoutdekimo @AxelLin Thanks for the report and the pointers. Will check the issue and see if we can cherry-pick the patch.

sguitton commented 3 years ago

Hi, I am working on the release v4.3 and I still have this issue. (LWIP version is 2.1.2) Is there a patch working on the v4.3? Is there a release planned which is solving this issue?

Thank you,

xueyunfei998 commented 2 years ago

hi @sguitton

I provided a pacth of 4.3, which distinguishes the dns of each netif. If you have any more questions, please feel free to ask them.

As for the official patch of lwip, that patch has changed too much for idf, and it will take some time to change it.

xueyunfei998 commented 2 years ago

dns_lwip.txt 4.3_dns.txt

dannybackx commented 7 months ago

Thanks @AxelLin for pointing me to this, looks like what I need. What's the timeline for integrating into an esp-idf 5.[23] version ?

Please note that with hardware like LilyGo T-SIM7080-S3 (which I have) and the recent Walter board, I expect this to become more important. The latter is on the Espressif newsroom (https://www.espressif.com/en/news/Walter?position=6&list=BgZNktBI5vtdxtpLHeKrS-lI_6r-b3RQ72Tf1IPLNiU)

Also some of the complexity in the esp-protocols example multiple_netifs is due to this.

dannybackx commented 7 months ago

I think I may have ported the stuff from 4.3 to v5.2.1 . There is at least one issue in the source, see comments, with including stuff from esp-netif into lwip. Also this won't work with a default setting of CONFIG_LWIP_DNS_MAX_SERVERS. Changed the Kconfig but there should really be a safeguard against this somewhere too. Chasing this took me a while.

Current output of my app looks like it should :

I (16:20:20.711) main_task: Started on CPU0
I (16:20:20.721) main_task: Calling app_main()
I (16:20:20.740) App: ESP-Alarm (c) 2017-2024 by Danny Backx
I (16:20:20.740) App: Build 2024-04-21 18:19:20
I (16:20:20.741) App: Using ArduinoJson v7.0.4
I (16:20:20.746) App: Using acmeclient v0.1.0
I (16:20:20.751) App: Mounting littlefs at /fs
I (16:20:20.758) Network: RegisterModule(App)
I (18:20:20.762) Network: Start connect (2 options)
I (18:20:20.767) pp: pp rom version: e7ae62f
I (18:20:20.772) net80211: net80211 rom version: e7ae62f
E (18:20:20.840) Network: wifi_start_internal: no preset IP
I (18:20:20.841) Network: Connecting to RoeselUBackx
I (18:20:23.767) esp_netif_lwip: esp_netif_dhcpc_start_api dns_setserver(0, 0)
I (18:20:23.768) esp_netif_lwip: esp_netif_dhcpc_start_api dns_setserver(1, 0)
I (18:20:23.993) dhcp: dhcp_handle_ack dns_setserver(0, 195.130.130.1)
I (18:20:23.994) dhcp: dhcp_handle_ack dns_setserver(1, 195.130.131.1)
I (18:20:24.766) esp_netif_handlers: wifi ip: 192.168.0.203, mask: 255.255.255.0, gw: 192.168.0.1
I (18:20:24.768) Network: onStaGotIP: call module App
I (18:20:24.771) App: Network connected, ip 192.168.0.203
I (18:20:24.778) App: Starting local ftp server
I (18:20:24.783) App: List DNS servers
I (18:20:24.787) App: DNS 0 : 195.130.130.1
I (18:20:24.792) App: DNS 1 : 195.130.131.1
I (18:20:24.797) App: DNS 2 : 0.0.0.0
I (18:20:24.801) App: DNS 3 : 0.0.0.0
I (18:20:24.806) App: DNS 4 : 0.0.0.0
I (18:20:24.810) App: DNS 5 : 0.0.0.0
I (18:20:24.814) App: DNS 6 : 0.0.0.0
I (18:20:24.819) App: DNS 7 : 0.0.0.0
I (18:20:24.823) gpio: GPIO[15]| InputEn: 1| OutputEn: 1| OpenDrain: 1| Pullup: 0| Pulldown: 0| Intr:0 
I (18:20:24.833) gpio: GPIO[7]| InputEn: 1| OutputEn: 1| OpenDrain: 1| Pullup: 0| Pulldown: 0| Intr:0 
I (18:20:24.844) AXP2101: Implemented using built-in read and write methods (Use higher version >= 5.0 API)
I (18:20:24.854) AXP2101: Init PMU SUCCESS!
I (18:20:24.863) AXP2101: DC1  : +   Voltage:3300 mV
I (18:20:24.865) AXP2101: DC3  : +   Voltage:3000 mV (modem)
I (18:20:24.872) AXP2101: BLDO1: +   Voltage:3300 mV (level converter)
I (18:20:24.885) pppos_example: PPP auth set
I (18:20:24.886) pppos_example: Initializing esp_modem for the SIM7070 module...
I (18:20:24.891) uart: queue free spaces: 30
I (18:20:24.897) gpio: GPIO[41]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0 
I (18:20:24.906) pppos_example: Wait for 2s ..
I (18:20:26.906) pppos_example: Trigger modem ..
I (18:20:28.106) pppos_example: Wait for 2s ..
I (18:20:30.106) pppos_example: Trigger modem ..
I (18:20:31.306) pppos_example: Wait for 2s ..
I (18:20:33.306) pppos_example: Check for modem ..
I (18:20:33.910) pppos_example: modem_init: success after iteration 1
I (18:20:33.911) pppos_example: modem_init ok
I (18:20:38.960) pppos_example: Signal quality: rssi=27, ber=99, ntries 9
I (18:20:38.972) pppos_example: esp_modem_set_mode(ESP_MODEM_MODE_DATA) ok
I (18:20:38.973) pppos_example: Waiting for IP address
I (18:20:39.026) ppp: sdns dns_setserver(4, 80.201.237.238)
I (18:20:39.027) ppp: sdns dns_setserver(5, 80.201.237.239)
I (18:20:39.028) pppos_example: List DNS servers
I (18:20:39.031) esp-netif_lwip-ppp: Connected
I (18:20:39.039) pppos_example: DNS 0 : 195.130.130.1
I (18:20:39.044) pppos_example: DNS 1 : 195.130.131.1
I (18:20:39.050) pppos_example: DNS 2 : 0.0.0.0
I (18:20:39.055) pppos_example: DNS 3 : 0.0.0.0
I (18:20:39.060) pppos_example: DNS 4 : 80.201.237.238
I (18:20:39.066) pppos_example: DNS 5 : 80.201.237.239
I (18:20:39.072) pppos_example: DNS 6 : 0.0.0.0
I (18:20:39.077) pppos_example: DNS 7 : 0.0.0.0
I (18:20:39.082) pppos_example: Modem Connect to PPP Server
I (18:20:39.089) pppos_example: IP          : 100.64.169.120
I (18:20:39.095) pppos_example: Netmask     : 255.255.255.255
I (18:20:39.101) pppos_example: Gateway     : 10.64.64.64
I (18:20:39.107) pppos_example: Name Server1: 195.130.130.1
I (18:20:39.114) pppos_example: Name Server2: 195.130.131.1
I (18:20:39.120) pppos_example: List DNS servers
I (18:20:39.125) pppos_example: DNS 0 : 195.130.130.1
I (18:20:39.131) pppos_example: DNS 1 : 195.130.131.1
I (18:20:39.137) pppos_example: DNS 2 : 0.0.0.0
I (18:20:39.142) pppos_example: DNS 3 : 0.0.0.0
I (18:20:39.147) pppos_example: DNS 4 : 80.201.237.238
I (18:20:39.153) pppos_example: DNS 5 : 80.201.237.239
I (18:20:39.159) pppos_example: DNS 6 : 0.0.0.0
I (18:20:39.164) pppos_example: DNS 7 : 0.0.0.0
I (18:20:39.169) pppos_example: PPP state changed event 0
I (18:20:39.169) Network: Connected to ppp - IPv4 address: 100.64.169.120, prio 20
I (18:20:39.184) Network: Connected to wifi - IPv4 address: 192.168.0.203, prio 128
I (18:20:39.192) Network: fixup_dns
I (18:20:39.196) Network: fixup_dns: default netif wifi
E (18:20:39.202) Network: fixup_dns : not fixing up (hardcoded)
I (18:20:39.208) Network: List DNS servers again
I (18:20:39.214) Network: DNS 0 : 195.130.130.1
I (18:20:39.219) Network: DNS 1 : 195.130.131.1
I (18:20:39.224) Network: DNS 2 : 0.0.0.0
I (18:20:39.229) Network: DNS 3 : 0.0.0.0
I (18:20:39.234) Network: DNS 4 : 80.201.237.238
I (18:20:39.239) Network: DNS 5 : 80.201.237.239
I (18:20:39.244) Network: DNS 6 : 0.0.0.0
I (18:20:39.249) Network: DNS 7 : 0.0.0.0
I (18:20:40.169) App: DoDynDNS(alarm.dannybackx.dns-cloud.net) succeeded
I (18:20:40.170) App: Starting local web server and OTA ..
I (18:20:40.172) Network: RegisterModule(WebServer)
I (18:20:40.178) WebServer: setAcceptedCertificates(*.dannybackx.dns-cloud.net)
I (18:20:40.188) WebServer: start: registered /status GET handler for HTTP
I (18:20:40.203) App: sntp_sync_notify
I (18:20:40.210) WebServer: start: registered /alarm GET handler for HTTP
I (18:20:40.217) WebServer: start: registered /favicon.ico GET handler for HTTP
I (18:20:40.225) App: WebServerStarted
I (18:20:40.230) Ota: setup: registered /update POST handler for HTTP
I (18:20:40.237) Ota: setup: registered / GET handler for HTTP
I (18:20:40.243) Ota: setup: registered /serverIndex GET handler for HTTP

diffs-multiple-netif-5.2.1.txt

david-cermak commented 6 months ago

The limitation is documented in https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/lwip.html#limitations

Here's a staus/update of this issue:

dannybackx commented 6 months ago

FYI the stuff I ported to 5.2.1 still appears to work with 5.3-beta1.

ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x15 (USB_UART_CHIP_RESET),boot:0x28 (SPI_FAST_FLASH_BOOT)
Saved PC:0x40056f64
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce2810,len:0x178c
load:0x403c8700,len:0x4
load:0x403c8704,len:0xcb8
load:0x403cb700,len:0x2d9c
entry 0x403c8914
I (26) boot: ESP-IDF v5.3-beta1-dirty 2nd stage bootloader
I (27) boot: compile time May 22 2024 19:11:58
I (27) boot: Multicore bootloader
I (31) boot: chip revision: v0.2
I (34) boot.esp32s3: Boot SPI Speed : 80MHz
I (39) boot.esp32s3: SPI Mode       : DIO
I (44) boot.esp32s3: SPI Flash Size : 4MB
I (49) boot: Enabling RNG early entropy source...
I (54) boot: Partition Table:
I (58) boot: ## Label            Usage          Type ST Offset   Length
I (65) boot:  0 nvs              WiFi data        01 02 00009000 00005000
I (72) boot:  1 otadata          OTA data         01 00 0000e000 00002000
I (80) boot:  2 ota_0            OTA app          00 10 00010000 001e0000
I (87) boot:  3 ota_1            OTA app          00 11 001f0000 001e0000
I (95) boot:  4 spiffs           Unknown data     01 82 003d0000 00030000
I (102) boot: End of partition table
I (107) boot: No factory image, trying OTA 0
I (112) esp_image: segment 0: paddr=00010020 vaddr=3c0e0020 size=3f41ch (259100) map
I (167) esp_image: segment 1: paddr=0004f444 vaddr=3fc9a200 size=00bd4h (  3028) load
I (168) esp_image: segment 2: paddr=00050020 vaddr=42000020 size=d5ec4h (876228) map
I (329) esp_image: segment 3: paddr=00125eec vaddr=3fc9add4 size=04158h ( 16728) load
I (333) esp_image: segment 4: paddr=0012a04c vaddr=40374000 size=16150h ( 90448) load
I (364) boot: Loaded app from partition at offset 0x10000
I (393) boot: Set actual ota_seq=1 in otadata[0]
I (393) boot: Disabling RNG early entropy source...
I (404) cpu_start: Multicore app
I (414) cpu_start: Pro cpu start user code
I (414) cpu_start: cpu freq: 160000000 Hz
I (414) app_init: Application information:
I (417) app_init: Project name:     webserver
I (422) app_init: App version:      1
I (426) app_init: Compile time:     May 22 2024 19:11:57
I (432) app_init: ELF file SHA256:  23e964052...
I (438) app_init: ESP-IDF:          v5.3-beta1-dirty
I (443) efuse_init: Min chip rev:     v0.0
I (448) efuse_init: Max chip rev:     v0.99 
I (453) efuse_init: Chip rev:         v0.2
I (458) heap_init: Initializing. RAM available for dynamic allocation:
I (465) heap_init: At 3FCA3588 len 00046188 (280 KiB): RAM
I (471) heap_init: At 3FCE9710 len 00005724 (21 KiB): RAM
I (477) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (483) heap_init: At 600FE100 len 00001EE8 (7 KiB): RTCRAM
I (491) spi_flash: detected chip: gd
I (494) spi_flash: flash io: dio
W (498) spi_flash: Detected size(16384k) larger than the size in the binary image header(4096k). Using the size in the binary image header.
I (511) sleep: Configure to isolate all GPIO pins in sleep state
I (518) sleep: Enable automatic switching of GPIO sleep configuration
I (17:15:03.425) main_task: Started on CPU0
I (17:15:03.435) main_task: Calling app_main()
I (17:15:03.448) App: ESP-Alarm (c) 2017-2024 by Danny Backx
I (17:15:03.448) App: Build 2024-05-22 19:14:43
I (17:15:03.449) App: Using ArduinoJson v7.0.4
I (17:15:03.454) App: Using acmeclient v0.1.0
I (17:15:03.459) App: Mounting littlefs at /fs
W (17:15:03.468) i2c.master: Please check pull-up resistances whether be connected properly. Otherwise unexpected behavior would happen. For more detailed information, please read docs
I (17:15:03.481) gpio: GPIO[15]| InputEn: 1| OutputEn: 1| OpenDrain: 1| Pullup: 0| Pulldown: 0| Intr:0 
I (17:15:03.491) gpio: GPIO[7]| InputEn: 1| OutputEn: 1| OpenDrain: 1| Pullup: 0| Pulldown: 0| Intr:0 
I (17:15:03.503) power: AXP2101 : id 4a
I (17:15:03.512) gpio: GPIO[6]| InputEn: 1| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:2 
I (17:15:03.516) power: enableBattVoltageMeasure -> true
I (17:15:03.522) power: Battery voltage 0, percentage -1
I (17:15:03.528) power: Power: isCharging: NO
I (17:15:03.533) power: Power: isDischarge: NO
I (17:15:03.538) power: Power: isVbusIn: YES
I (17:15:03.543) power: Power: getBattVoltage: 0 mV
I (17:15:03.548) power: Power: getVbusVoltage: 5109 mV
I (17:15:03.554) power: Power: getSystemVoltage: 5119 mV
I (17:15:03.560) power: Power: chargerStatus 5 charging stopped
I (17:15:03.572) logger: Logger: log file has 34 lines, read starting at 0
I (17:15:03.581) Network: RegisterModule(App)
I (19:15:03.583) Network: wifi_task : task started
I (19:15:03.585) pp: pp rom version: e7ae62f
I (19:15:03.590) net80211: net80211 rom version: e7ae62f
E (19:15:03.645) Network: wifi_start_internal: no preset IP
I (19:15:03.647) Network: Connecting to RoeselUBackx
I (19:15:06.560) esp_netif_lwip: esp_netif_dhcpc_start_api dns_setserver(0, 0)
I (19:15:06.561) esp_netif_lwip: esp_netif_dhcpc_start_api dns_setserver(1, 0)
I (19:15:07.562) Network: onGotIP, netif wifi IPv4 192.168.0.205 mask 255.255.255.0 gw 192.168.0.1
I (19:15:07.562) Network: onGotIP: call module App
I (19:15:07.566) App: Network connected, ip 192.168.0.205
I (19:15:07.711) App: DoDynDNS(alarm.dannybackx.dns-cloud.net) succeeded
I (19:15:07.713) App: Starting local ftp server
I (19:15:07.714) Network: RegisterModule(WebServer)
I (19:15:07.719) WebServer: setAcceptedCertificates(*.dannybackx.dns-cloud.net)
I (19:15:07.729) WebServer: start: registered /status GET handler for HTTP
I (19:15:07.735) WebServer: start: registered /alarm GET handler for HTTP
I (19:15:07.742) WebServer: start: registered /favicon.ico GET handler for HTTP
I (19:15:07.751) WebServer: start: registered /dns GET handler for HTTP
I (19:15:07.758) App: WebServerStarted
I (19:15:07.762) Ota: setup: registered /update POST handler for HTTP
I (19:15:07.769) Ota: setup: registered / GET handler for HTTP
I (19:15:07.776) Ota: setup: registered /serverIndex GET handler for HTTP
I (19:15:07.783) Network: onGotIP: call module WebServer
I (19:15:13.590) Network: gsm_task : task started
I (19:15:13.591) gpio: GPIO[41]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0 
I (19:15:13.598) uart: queue free spaces: 30
I (19:15:13.602) Network: gsm_task init modem
I (19:15:24.546) App: sntp_sync_notify
I (19:15:24.547) logger: add(Node alarm.dannybackx.dns-cloud.net booted at 2024-05-22 19:15:24)
I (19:15:24.599) logger: add(Connected to wifi - IPv4 address: 192.168.0.205, prio 128)
E (19:15:33.846) Network: Signal quality: rssi=99, ber=99, ntries 11
I (19:15:33.847) Network: gsm_task init modem
I (19:15:49.901) Network: Signal quality: rssi=23, ber=99, ntries 7
I (19:15:49.913) Network: gsm_task: waiting for connection to break
I (19:15:51.365) logger: add(onGotIP, netif ppp IPv4 100.89.8.226 mask 255.255.255.255 gw 10.64.64.64)
I (19:15:51.418) Network: onGotIP, netif ppp IPv4 100.89.8.226 mask 255.255.255.255 gw 10.64.64.64
I (19:15:51.420) Network: on_ppp_changed: PPP state changed event 0
I (19:15:57.916) App: DoDynDNS(alarm.dannybackx.dns-cloud.net) succeeded

Not exactly the same output as before but the repeated DynDNS call (last line) shows it works both before and after enabling a 2nd netif.

david-cermak commented 4 months ago

Implemented in https://github.com/espressif/esp-idf/commit/6acdb384f697e515fd6fdf68489f9213af2ecb66 (controlled by CONFIG_ESP_NETIF_SET_DNS_PER_DEFAULT_NETIF)

dannybackx commented 4 months ago

Thanks. Will this make it into esp-idf v5.3 ?

dannybackx commented 4 months ago

Ok, it's not in 5.3 but I patched my version. How can I use it ? I don't see a documentation update, and I don't understand which callback to use how. Could you e.g. share your test program ?

david-cermak commented 4 months ago

Thanks. Will this make it into esp-idf v5.3 ?

I'll probably backport it to both v5.2 and v5.3 (we don't usually backport features, but this involves lwip, which we'd like to keep ~updated)

Ok, it's not in 5.3 but I patched my version.

Note that the fix (merge commit 6a75241bf1b3a1b8d31ce9b32ef79939ae5e3763) contains two commits: one with the callback 17a635b23b35bfd0f2676c3ccbe883424d9e1ca2, and the other which uses it 6acdb384f697e515fd6fdf68489f9213af2ecb66 (so you don't have to set/use any lwip callbacks)

How can I use it ? I don't see a documentation update, and I don't understand which callback to use how.

You just enable the CONFIG_ESP_NETIF_SET_DNS_PER_DEFAULT_NETIF and the DNS servers will be updated automatically with the default network interface (e.g. based on route-prio or via API call esp_netif_set_default_netif().

It's briefly documented here: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/kconfig.html#config-esp-netif-set-dns-per-default-netif (Kconfig option descriptions are used to generate this)

Could you e.g. share your test program ?

Still the same program: https://github.com/espressif/esp-protocols/tree/master/examples/esp_netif/multiple_netifs (but removed all the DNS server's handling form the app in https://github.com/espressif/esp-protocols/pull/609)

david-cermak commented 4 months ago

Reopening per comments in https://github.com/espressif/esp-idf/issues/14249.

(will close after the fix is accepted by users)

dannybackx commented 4 months ago

I've been able to test, and it works for me. One of the things that confused me before reading the sources again was the mention of callbacks. I thought I'd have to implement these. Turns out I don't have to because you're storing DNS info in the netif structure, as you should ;-).

Thanks !

PS so basically all I had to do to my networking code is to remove ESP_NETIF_WIFI_DNS_MAIN (compatible with my hack) and replace with the normal ESP_NETIF_DNS_MAIN.

AxelLin commented 4 months ago

Thanks. Will this make it into esp-idf v5.3 ?

I'll probably backport it to both v5.2 and v5.3 (we don't usually backport features, but this involves lwip, which we'd like to keep ~updated)

Yes, please backport to stable branches. This is a bug fix, if it is only available in master, existing users still has issue.