espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.45k stars 7.25k forks source link

DNS client fails if two active netifs ? (IDFGH-12641) #13636

Closed dannybackx closed 4 months ago

dannybackx commented 5 months ago

Answers checklist.

IDF version.

5.2.1

Espressif SoC revision.

esp32s3 (lilygo t-sim7080g s3 board)

Operating System used.

Linux

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

None

Development Kit.

lilygo t-sim7080g s3 board

Power Supply used.

USB

What is the expected behavior?

With two netifs (wifi and cell) the DNS client appears to fail. Works well without activating cell netif. This log shows three dns_recv events as it should.

victus: {412} fgrep -a dns_ typescript.04
D (00:00:26.241) lwip: dns_init: initializing
0x4204a83c: esp_netif_get_dns_info_api at /home/danny/src/github/esp32/esp-idf-v5.2.1/components/esp_netif/lwip/esp_netif_lwip.c:1948
D (01:00:31.882) esp_netif_lwip: esp_netif_get_dns_info: esp_netif=0x3fcb9854 type=0
0x4204a83c: esp_netif_get_dns_info_api at /home/danny/src/github/esp32/esp-idf-v5.2.1/components/esp_netif/lwip/esp_netif_lwip.c:1948
D (01:00:31.909) esp_netif_lwip: esp_netif_get_dns_info: esp_netif=0x3fcb9854 type=1
0x4204a83c: esp_netif_get_dns_info_api at /home/danny/src/github/esp32/esp-idf-v5.2.1/components/esp_netif/lwip/esp_netif_lwip.c:1948
D (01:00:31.936) esp_netif_lwip: esp_netif_get_dns_info: esp_netif=0x3fcb9854 type=2
D (01:00:32.556) lwip: dns_enqueue: "ipv4.cloudns.net": use DNS entry 0
D (01:00:32.585) lwip: dns_enqueue: "ipv4.cloudns.net": use DNS pcb 0
D (01:00:32.592) lwip: dns_send: dns_servers[0] "ipv4.cloudns.net": request
D (01:00:33.049) lwip: dns_recv: "ipv4.cloudns.net": response = 
D (01:00:33.713) lwip: dns_tmr: dns_check_entries
D (01:00:34.760) lwip: dns_tmr: dns_check_entries
D (01:00:35.815) lwip: dns_tmr: dns_check_entries
D (01:00:36.234) lwip: dns_enqueue: "ntp.jimmobile.be": use DNS entry 1
D (01:00:36.257) lwip: dns_enqueue: "ntp.jimmobile.be": use DNS pcb 0
D (01:00:36.263) lwip: dns_send: dns_servers[0] "ntp.jimmobile.be": request
D (01:00:36.632) lwip: dns_recv: "ntp.jimmobile.be": response = 
D (01:00:36.989) lwip: dns_tmr: dns_check_entries
D (01:00:37.993) lwip: dns_tmr: dns_check_entries
[...]
D (01:00:51.263) lwip: dns_tmr: dns_check_entries
D (01:00:51.795) lwip: dns_enqueue: "pool.ntp.org": use DNS entry 2
D (01:00:51.823) lwip: dns_enqueue: "pool.ntp.org": use DNS pcb 0
D (01:00:51.830) lwip: dns_send: dns_servers[0] "pool.ntp.org": request
D (01:00:52.401) lwip: dns_tmr: dns_check_entries
D (01:00:52.406) lwip: dns_send: dns_servers[0] "pool.ntp.org": request
D (01:00:52.758) lwip: dns_recv: "pool.ntp.org": response = 
D (20:29:12.293) lwip: dns_tmr: dns_check_entries
D (20:29:13.288) lwip: dns_tmr: dns_check_entries

What is the actual behavior?

Different patterns for good/bad :

victus: {417} fgrep -a -e dns_recv -e "lwip: |        53     " typescript.03
D (19:53:51.659) lwip: |        53     |     60982     | (src port, dest port)
D (19:53:51.739) lwip: dns_recv: "ntp.jimmobile.be": response = 
D (19:54:05.534) lwip: |        53     |     55132     | (src port, dest port)
D (19:54:05.887) lwip: |        53     |     55132     | (src port, dest port)
D (19:54:07.169) lwip: |        53     |     55132     | (src port, dest port)
D (19:54:07.405) lwip: |        53     |     47547     | (src port, dest port)
D (19:54:08.119) lwip: |        53     |     47547     | (src port, dest port)
D (19:54:09.362) lwip: |        53     |     55132     | (src port, dest port)
D (19:54:09.586) lwip: |        53     |     47547     | (src port, dest port)
D (19:54:11.498) lwip: |        53     |     47547     | (src port, dest port)
D (19:54:12.802) lwip: |        53     |     55132     | (src port, dest port)
D (19:54:13.744) lwip: |        53     |     55132     | (src port, dest port)
D (19:54:14.994) lwip: |        53     |     55132     | (src port, dest port)
D (19:54:15.218) lwip: |        53     |     47547     | (src port, dest port)
D (19:54:16.118) lwip: |        53     |     47547     | (src port, dest port)
D (19:54:17.352) lwip: |        53     |     55132     | (src port, dest port)
D (19:54:17.576) lwip: |        53     |     47547     | (src port, dest port)
D (19:54:19.816) lwip: |        53     |     47547     | (src port, dest port)
victus: {418} fgrep -a -e dns_recv -e "lwip: |        53     " typescript.04
D (01:00:32.969) lwip: |        53     |     57395     | (src port, dest port)
D (01:00:33.049) lwip: dns_recv: "ipv4.cloudns.net": response = 
D (01:00:36.536) lwip: |        53     |     36505     | (src port, dest port)
D (01:00:36.632) lwip: dns_recv: "ntp.jimmobile.be": response = 
D (01:00:52.646) lwip: |        53     |     19793     | (src port, dest port)
D (01:00:52.758) lwip: dns_recv: "pool.ntp.org": response = 
D (01:00:53.215) lwip: |        53     |     19793     | (src port, dest port)

More detail from the failed run - a succeeded and a failed cal :

D (19:53:51.638) lwip: ip4_input: p->len 126 p->tot_len 126^M^M
D (19:53:51.644) lwip: udp_input: received datagram of length 106^M^M
D (19:53:51.650) lwip: UDP header:^M^M
D (19:53:51.654) lwip: +-------------------------------+^M^M
D (19:53:51.659) lwip: |        53     |     60982     | (src port, dest port)^M^M
D (19:53:51.667) lwip: +-------------------------------+^M^M
D (19:53:51.673) lwip: |       106     |     0x9bb4    | (len, chksum)^M^M
D (19:53:51.679) lwip: +-------------------------------+^M^M
D (19:53:51.685) lwip: udp (^[[0m^M^M
D (19:53:51.688) lwip: 192.168.0.203^[[0m^M^M
D (19:53:51.692) lwip: , 60982) <-- (^[[0m^M^M
D (19:53:51.695) lwip: 195.130.130.1^[[0m^M^M
D (19:53:51.699) lwip: , 53)^M^M
D (19:53:51.702) lwip: pcb (^[[0m^M^M
D (19:53:51.705) lwip: 0.0.0.0^[[0m^M^M
D (19:53:51.708) lwip: , 60982) <-- (^[[0m^M^M
D (19:53:51.712) lwip: 0.0.0.0^[[0m^M^M
D (19:53:51.715) lwip: , 0)^M^M
D (19:53:51.718) lwip: pcb (^[[0m^M^M
D (19:53:51.721) lwip: 0.0.0.0^[[0m^M^M
D (19:53:51.724) lwip: , 68) <-- (^[[0m^M^M
D (19:53:51.728) lwip: 0.0.0.0^[[0m^M^M
D (19:53:51.731) lwip: , 67)^M^M
D (19:53:51.734) lwip: udp_input: calculating checksum^M^M
D (19:53:51.739) lwip: dns_recv: "ntp.jimmobile.be": response = ^[[0m^M^M
D (19:53:51.745) lwip: 18.239.208.57^[[0m^M^M
D (19:53:51.749) lwip: ^M^M

[...]
D (19:54:05.513) lwip: ip4_input: p->len 94 p->tot_len 94^M^M
D (19:54:05.519) lwip: udp_input: received datagram of length 74^M^M
D (19:54:05.525) lwip: UDP header:^M^M
D (19:54:05.529) lwip: +-------------------------------+^M^M
D (19:54:05.534) lwip: |        53     |     55132     | (src port, dest port)^M^M
D (19:54:05.542) lwip: +-------------------------------+^M^M
D (19:54:05.547) lwip: |        74     |     0x18d7    | (len, chksum)^M^M
D (19:54:05.554) lwip: +-------------------------------+^M^M
D (19:54:05.559) lwip: udp (^[[0m^M^M
D (19:54:05.562) lwip: 192.168.0.203^[[0m^M^M
D (19:54:05.566) lwip: , 55132) <-- (^[[0m^M^M
D (19:54:05.570) lwip: 195.130.130.1^[[0m^M^M
D (19:54:05.573) lwip: , 53)^M^M
D (19:54:05.577) lwip: pcb (^[[0m^M^M
D (19:54:05.580) lwip: 0.0.0.0^[[0m^M^M
D (19:54:05.583) lwip: , 55132) <-- (^[[0m^M^M
D (19:54:05.586) lwip: 0.0.0.0^[[0m^M^M
D (19:54:05.590) lwip: , 0)^M^M
D (19:54:05.593) lwip: pcb (^[[0m^M^M
D (19:54:05.596) lwip: 0.0.0.0^[[0m^M^M
D (19:54:05.599) lwip: , 64662) <-- (^[[0m^M^M
D (19:54:05.602) lwip: 0.0.0.0^[[0m^M^M
D (19:54:05.606) lwip: , 0)^M^M
D (19:54:05.609) lwip: pcb (^[[0m^M^M
D (19:54:05.611) lwip: 0.0.0.0^[[0m^M^M
D (19:54:05.615) lwip: , 68) <-- (^[[0m^M^M
D (19:54:05.618) lwip: 0.0.0.0^[[0m^M^M
D (19:54:05.621) lwip: , 67)^M^M
D (19:54:05.624) lwip: udp_input: calculating checksum^M^M
D (19:54:05.630) lwip: dns_tmr: dns_check_entries^M^M
D (19:54:05.635) lwip: dns_send: dns_servers[0] "ipv4.cloudns.net": request^M^M
D (19:54:05.642) lwip: sending DNS request ID 4414 for name "ipv4.cloudns.net" to server 0^M^M^M
D (19:54:05.650) lwip: udp_send^M^M
D (19:54:05.654) lwip: udp_send: added header in given pbuf 0x3fcc38d8^M^M

Steps to reproduce.

Code at https://sourceforge.net/p/lilygo-t-sim- ... webserver/

Debug Logs.

No response

More Information.

Please tell me how to figure out what's wrong

dannybackx commented 5 months ago

Ok I think I found the reason for this problem but no solution yet.

The esp-netif layer gives the impression that DNS servers are specified per netif. (You call it with a netif as a parameter.) When querying the DNS servers after network setup, it's clear that each successful connection sets the DNS servers (that part is documented). Example :

I (20:06:27.660) Network: fixup_dns: default netif wifi
I (20:06:27.666) Network: List DNS servers
I (20:06:27.671) Network: IF 0 ppp dns 0 : 80.201.237.238
I (20:06:27.677) Network: IF 0 ppp dns 1 : 80.201.237.239
I (20:06:27.683) Network: IF 0 ppp dns 2 : 0.0.0.0
I (20:06:27.688) Network: IF 1 wifi dns 0 : 80.201.237.238
I (20:06:27.695) Network: IF 1 wifi dns 1 : 80.201.237.239
I (20:06:27.701) Network: IF 1 wifi dns 2 : 0.0.0.0

FYI my code currently sets up wifi first, then cell service, so it's the DNS servers of the mobile network you see. That's why my app wouldn't work : I'm on another network via WiFi so they don't respond.

The netif layer calls LWIP without the netif argument (see esp_netif_set_dns_info_api()) so that's where info gets lost.

Setting servers 0, 1, 2 for DNS doesn't reliably work either, see attempt results in the comments :

#if 1
  // This works
  uint32_t ip1 = esp_ip4addr_aton("195.130.130.1");     // asse.dnscache01.telenet-ops.be.
  uint32_t ip2 = esp_ip4addr_aton("195.130.131.1");     // asse.dnscache02.telenet-ops.be.
  uint32_t ip3 = esp_ip4addr_aton("80.201.237.238");    // something jimmobile.be
#endif
#if 0
  // This fails
  uint32_t ip1 = esp_ip4addr_aton("195.130.130.1");     // asse.dnscache01.telenet-ops.be.
  uint32_t ip2 = esp_ip4addr_aton("80.201.237.238");    // something jimmobile.be
  uint32_t ip3 = esp_ip4addr_aton("195.130.131.1");     // asse.dnscache02.telenet-ops.be.
#endif
#if 0
  // This fails
  uint32_t ip1 = esp_ip4addr_aton("195.130.130.1");     // asse.dnscache01.telenet-ops.be.
  uint32_t ip2 = esp_ip4addr_aton("80.201.237.238");    // something jimmobile.be
  uint32_t ip3 = esp_ip4addr_aton("216.239.32.10");     // ns.google.com
#endif

It's unclear to me why only one of these appears to work, and how to proceed. Should an application catch esp-netif availability and set DNS servers based on priority ? If yes then it would seem that the priorities in the netif layer are not useful/working. Help ;-)

AxelLin commented 5 months ago

https://github.com/espressif/esp-idf/issues/6270#issuecomment-745288299

david-cermak commented 4 months ago

This is a known limitation in IDF/lwip (also documented in: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/lwip.html#adapted-apis)

This will be handled on esp_netif layer in https://github.com/espressif/esp-idf/issues/6270 (closing this one as a duplicated issue)