esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
15.88k stars 13.35k forks source link

WifiClient cannot open connection over mobile broadband router? #8005

Open edwinnap opened 3 years ago

edwinnap commented 3 years ago

Basic Infos

Platform

Settings in IDE

Problem Description

Strangest thing. Any connection attempt from WiFiClient.connect() to anywhere out on the public internet over a mobile broadband router is failing 100% of the time.

The router in question is an ORBI LBR20 which can route traffic either out over an ethernet WAN connection (if available) or over its LTE radio connection. The ESP8266 can connect fine to any host on the LAN side (10.0.0.x) and anywhere out on the public internet if the WAN connection is present. But when the WAN is not connected and traffic is being routed over the mobile broadband uplink, all connection attempts fail. Every other device on same network (laptops, computers, tablets, other IoT devices, etc., etc) can connect without issue.

Of course something in the router or on the mobile carrier's network (T-Mobile) must be dropping/rejecting the packets, but there must be something about the ESP8266/LWIP packets that the filtering is working off (?). Thought originally if might be related to 4593 (https://github.com/esp8266/Arduino/issues/4593) and MSS size. But have tried every variant of ip4, ip6, low memory, high bandwidth in 2.5.1, 2.5.2, 2.6.3, and 2.7.4. Result is always the same (we've been at this for a few days straight :-)

The router is very difficult to get into (would love to be able to configure the iptables chain to log dropped packets!). We even installed a second router with it's own layer of NAT and then tried to connect from that (the ORBI has a LAN port as well, second router's uplink is through that). Same issue; any device can route to the outside internet over mobile link except ESP8266 (!).

Sample code that illustrates the failure is included below, but of course only those with similar hardware might also see the problem. Just hoping someone might have some thoughts on what to try in LWIP code to tease out what the issue could be. In the meantime, we are going to try a second router with built in VPN to see if we can tunnel through whatever is stopping ESP8266 connections.

MCVE Sketch


/*
  arduino IPv6 example
  released to public domain

  output is like:

  SDK:2.2.1(cfd48f3)/Core:2.4.2-141-g4f97603/lwIP:IPv6+STABLE-2_1_0_RC1/glue:arduino-2.4.2-30-ga53619c/BearSSL:6d1cefc
  dns0=10.43.1.254
  Try me at these addresses:
  (with 'telnet <addr> or 'nc -u <addr> 23')
  IF='st'(0) IPv6=0 local=0 hostname='ipv6test' addr= 10.43.1.244 / mask:255.255.255.0 / gw:10.43.1.254
  IF='st'(0) IPv6=1 local=1 hostname='ipv6test' addr= fe80::1afe:34ff:fed1:cec7
  IF='st'(0) IPV6=1 local=0 hostname='ipv6test' addr= 2xxx:xxxx:xxxx:xxxx:1afe:34ff:fed1:cec7
  resolving www.google.com: 216.58.205.100
  resolving ipv6.google.com: 2a00:1450:4002:808::200e
*/

#include <ESP8266WiFi.h>
#include <WiFiUdp.h>
#include <PolledTimeout.h>
#include <AddrList.h>
#include <lwip/dns.h>

#ifndef STASSID
#define STASSID "ORBI15"
#define STAPSK  "xxxxxxxxxx"
#endif

#define FQDN  F("www.google.com") // with both IPv4 & IPv6 addresses
#define FQDN6 F("ipv6.google.com") // does not resolve in IPv4
#define STATUSDELAY_MS 10000
#define TCP_PORT 23
#define UDP_PORT 23

WiFiServer statusServer(TCP_PORT);
WiFiClient fdr_client;
WiFiClientSecure fdrs_client;
WiFiUDP udp;
esp8266::polledTimeout::periodicMs showStatusOnSerialNow(STATUSDELAY_MS);

void fqdn(Print& out, const String& fqdn) {
  out.print(F("resolving "));
  out.print(fqdn);
  out.print(F(": "));
  IPAddress result;
  if (WiFi.hostByName(fqdn.c_str(), result)) {
    result.printTo(out);
    out.println();
  } else {
    out.println(F("timeout or not found"));
  }
}

void status(Print& out) {
  out.println(F("------------------------------"));
  out.println(ESP.getFullVersion());

  for (int i = 0; i < DNS_MAX_SERVERS; i++) {
    IPAddress dns = WiFi.dnsIP(i);
    if (dns.isSet()) {
      out.printf("dns%d: %s\n", i, dns.toString().c_str());
    }
  }

  out.println(F("Try me at these addresses:"));
  out.println(F("(with 'telnet <addr> or 'nc -u <addr> 23')"));
  for (auto a : addrList) {
    out.printf("IF='%s' IPv6=%d local=%d hostname='%s' addr= %s",
               a.ifname().c_str(),
               a.isV6(),
               a.isLocal(),
               a.ifhostname(),
               a.toString().c_str());

    if (a.isLegacy()) {
      out.printf(" / mask:%s / gw:%s",
                 a.netmask().toString().c_str(),
                 a.gw().toString().c_str());
    }

    out.println();

  }

  // lwIP's dns client will ask for IPv4 first (by default)
  // an example is provided with a fqdn which does not resolve with IPv4
  fqdn(out, FQDN);
  fqdn(out, FQDN6);

  out.println(F("------------------------------"));
}

void setup() {
  Serial.setDebugOutput(true);
  WiFi.hostname("ipv6test");

  Serial.begin(115200);
  Serial.println();
  Serial.println(ESP.getFullVersion());

#if LWIP_IPV6
  Serial.printf("IPV6 is enabled\n");
#else
  Serial.printf("IPV6 is not enabled\n");
#endif

  WiFi.mode(WIFI_STA);
  WiFi.begin(STASSID, STAPSK);

  status(Serial);

#if 0 // 0: legacy connecting loop - 1: wait for IPv6

  // legacy loop (still valid with IPv4 only)

  while (WiFi.status() != WL_CONNECTED) {
    Serial.print('.');
    delay(500);
  }

#else

  // Use this loop instead to wait for an IPv6 routable address

  // addr->isLocal() (meaning "not routable on internet") is true with:
  // - IPV4 DHCP autoconfigured address 169.254.x.x
  //   (false for any other including 192.168./16 and 10./24 since NAT may be in the equation)
  // - IPV6 link-local addresses (fe80::/64)

  for (bool configured = false; !configured;) {
    for (auto addr : addrList)
      if ((configured = !addr.isLocal()
                        // && addr.isV6() // uncomment when IPv6 is mandatory
                        // && addr.ifnumber() == STATION_IF
          )) {
        break;
      }
    Serial.print('.');
    delay(500);
  }

#endif

  Serial.println(F("connected: "));

  statusServer.begin();
  udp.begin(UDP_PORT);

  Serial.print(F("TCP server on port "));
  Serial.print(TCP_PORT);
  Serial.print(F(" - UDP server on port "));
  Serial.println(UDP_PORT);

  showStatusOnSerialNow.reset();
}

unsigned long statusTimeMs = 0;

void loop() {

  if (statusServer.hasClient()) {
    WiFiClient cli = statusServer.available();
    status(cli);
  }

  // if there's data available, read a packet
  int packetSize = udp.parsePacket();
  if (packetSize) {
    Serial.print(F("udp received "));
    Serial.print(packetSize);
    Serial.print(F(" bytes from "));
    udp.remoteIP().printTo(Serial);
    Serial.print(F(" :"));
    Serial.println(udp.remotePort());
    int  c;
    while ((c = udp.read()) >= 0) {
      Serial.write(c);
    }

    // send a reply, to the IP address and port that sent us the packet we received
    udp.beginPacket(udp.remoteIP(), udp.remotePort());
    status(udp);
    udp.endPacket();
  }

  if (showStatusOnSerialNow) {
    status(Serial);
    Serial.println("Attempting outbound connection ...");
    //if(fdr_client.connect(IPAddress(10,0,0,2), 2222))
    if(fdr_client.connect("cnn.com", 80))
    {
        Serial.println("Yah!");
    }
    else
    {
        Serial.print("Failed to connect using gateway of ");
        Serial.println(WiFi.gatewayIP());
    }
    //fdr_client.stop()
    Serial.println("=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-");    
  }

}

Debug Messages

09:02:29.173 -> SDK:2.2.2-dev(38a443e)/Core:2.7.3-3-g2843a5ac=20703003/lwIP:IPv6+STABLE-2_1_2_RELEASE/glue:1.2-30-g92add50/BearSSL:5c771be
09:02:29.173 -> dns0: 10.0.0.1
09:02:29.173 -> dns1: fe80::9ec9:ebff:fe0e:8714
09:02:29.173 -> Try me at these addresses:
09:02:29.173 -> (with 'telnet <addr> or 'nc -u <addr> 23')
09:02:29.173 -> IF='st' IPv6=0 local=0 hostname='ipv6test' addr= 10.0.0.2 / mask:255.255.255.0 / gw:10.0.0.1
09:02:29.173 -> IF='st' IPv6=1 local=1 hostname='ipv6test' addr= fe80::42f5:20ff:fe35:6fb
09:02:29.173 -> IF='st' IPv6=1 local=0 hostname='ipv6test' addr= 2607:fb90:ac16:f47e:42f5:20ff:fe35:6fb
09:02:29.206 -> resolving www.google.com: [hostByName] request IP for: www.google.com
09:02:29.272 -> [hostByName] Host: www.google.com IP: 172.217.13.68
09:02:29.272 -> 172.217.13.68
09:02:29.272 -> resolving ipv6.google.com: [hostByName] request IP for: ipv6.google.com
09:02:29.438 -> [hostByName] Host: ipv6.google.com IP: 2607:f8b0:4004:800::200e
09:02:29.438 -> 2607:f8b0:4004:800::200e
09:02:29.438 -> ------------------------------
09:02:29.438 -> Attempting outbound connection ...
09:02:29.438 -> [hostByName] request IP for: cnn.com
09:02:29.471 -> [hostByName] Host: cnn.com IP: 151.101.65.67
09:02:29.471 -> :ref 1
09:02:34.614 -> :ctmo
09:02:34.647 -> :abort
09:02:34.647 -> :ur 1
09:02:34.647 -> :dsrcv 0
09:02:34.647 -> :del
09:02:34.647 -> Failed to connect using gateway of 10.0.0.1
09:02:34.647 -> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
d-a-v commented 3 years ago

Did you try with your phone as wifi 2.4G AP ? Did you try with something else than cnn as a server (google.com answer on port 80) ?

edwinnap commented 3 years ago

Did you try with your phone as wifi 2.4G AP ? Did you try with something else than cnn as a server (google.com answer on port 80) ?

Yes to both of those. No problem routing over phones as hotspot(s), and we tried many different servers and ports. Can always connect fine when routed over ethernet/WAN uplink, but not over cellular/LTE. The issue is clearly with the router, but it is somehow able to discriminate over something in ESP8266/LWIP packets that is different from other packets.

d-a-v commented 3 years ago

Do you have a public (= reachable with a public IP address) server with some opened port on which you could run tcpdump/wireshark and see if packets arrive ?

edwinnap commented 3 years ago

Yes, we do. Will check on that and report back shortly. Thanks, very good suggestion.

edwinnap commented 3 years ago

OK, on a spare ec2 instance we opened a port in the firewall and ran tcpdump on that port. Tested with a computer (telnet to port number). Computer connected to router in question and no WAN cable present, all traffic routed over mobile broadband. Tcpdump works great shows initial connection, ack, etc.

WifiClient.connect() attempts to same server at same port show nothing (tcpdump displays no activity).

Same config all around except WAN cable plugged in, ESP8266 traffic works fine (and tcpdump shows packets arriving at server).

For reference, here is tcpdump output when succesful (over WAN):

sudo tcpdump -v port 50003
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
13:49:53.481628 IP (tos 0x20, ttl 46, id 57415, offset 0, flags [DF], proto TCP (6), length 60)
    c-69-243-85-212.hsd1.dc.comcast.net.47854 > ip-170-13-30-40.ec2.internal.50003: Flags [S], cksum 0xd18c (correct), seq 4215953204, win 64240, options [mss 1460,sackOK,TS val 3135539140 ecr 0,nop,wscale 7], length 0
13:49:53.481684 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    ip-170-13-30-40.ec2.internal.50003 > c-69-243-85-212.hsd1.dc.comcast.net.47854: Flags [S.], cksum 0x6846 (incorrect -> 0x66f8), seq 3890534711, ack 4215953205, win 62643, options [mss 8961,sackOK,TS val 400194172 ecr 3135539140,nop,wscale 7], length 0
13:49:53.496891 IP (tos 0x20, ttl 46, id 57416, offset 0, flags [DF], proto TCP (6), length 52)
    c-69-243-85-212.hsd1.dc.comcast.net.47854 > ip-170-13-30-40.ec2.internal.50003: Flags [.], cksum 0xa5c3 (correct), ack 1, win 502, options [nop,nop,TS val 3135539152 ecr 400194172], length 0
d-a-v commented 3 years ago

In the logs above, the cksum 0x6846 (incorrect -> 0x66f8) from the server is strange.

So packets are indeed not getting out from the LTE router.

You could try a wireshark capture of a dumb TCP connection from a PC and from the ESP and see how they differentiate. The local receiver would do the capture.

Here are the internal tools that helped debugging networking in this core:

edwinnap commented 3 years ago

Thanks, this all looks like helpful areas to explore. Much appreciated.

We did get part way through getting into promiscuous mode on laptop so we could do the wireshark pc versus ESP comparison. Will keep going down that road. And try the host environment stuff.

edwinnap commented 3 years ago

Small update: We can confirm that if we have a secondary WiFi router with a built in VPN client (out to a VPN sever on the public internet) connected to the ORBI (via ethernet LAN), the ESP is able to open a connection (and send data) without issue. So as long as we have a tunnel through the LTE mobile broadband uplink, all is fine.

Still working on the TCP packet comparisons to try and tease out what it is the ORBI doesn't like about (non-tunneled) ESP packets when preventing them from being routed over the mobile link.

devyte commented 3 years ago

@edwinnap any updates on this?

edwinnap commented 3 years ago

Nothing definitive yet. The VPN tunnel has been stable, so we have been a little slow on the packet analysis. We will keep at it though.

TD-er commented 2 years ago

Maybe you can also try to find out what the max MTU is via this router? One description of how to do it is this Citrix article

edwinnap commented 2 years ago

Ah, that's a good idea, thank you. We did fiddle with MTU sizes in the ESP network code at one point, but it did not seem to have any effect. Should at least be able to tell if we can generate the same issue on something other than the ESP by changing the MTU size ...

d-a-v commented 2 years ago

ESP's lwIP is by default configured with IP fragment and reassembly options. The "no features" lwIP variant disables these two features. (I'm not saying there is no bug nowhere)

edwinnap commented 2 years ago



We have not made any further progress yet. We only had one user with this device/issue, so it was just easier to give then a VPN tunnel device and move this down the to do list :-(

On Jan 30, 2022, at 6:19 PM, Farzad @.***> wrote:  Any updates on it? I am having this problem but there does not seem to be any documented workaround.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.

faradm commented 2 years ago

We have not made any further progress yet. We only had one user with this device/issue, so it was just easier to give then a VPN tunnel device and move this down the to do list :-( On Jan 30, 2022, at 6:19 PM, Farzad @.***> wrote:  Any updates on it? I am having this problem but there does not seem to be any documented workaround. — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.

Never mind, it turned out I had another problem. Whenever I used mobile network, the server for some reason would decide to send me a compressed gzipped version. I solved the problem by explicitly setting the content-encoding to "identity"