Open edwinnap opened 3 years ago
Did you try with your phone as wifi 2.4G AP ? Did you try with something else than cnn as a server (google.com answer on port 80) ?
Did you try with your phone as wifi 2.4G AP ? Did you try with something else than cnn as a server (google.com answer on port 80) ?
Yes to both of those. No problem routing over phones as hotspot(s), and we tried many different servers and ports. Can always connect fine when routed over ethernet/WAN uplink, but not over cellular/LTE. The issue is clearly with the router, but it is somehow able to discriminate over something in ESP8266/LWIP packets that is different from other packets.
Do you have a public (= reachable with a public IP address) server with some opened port on which you could run tcpdump/wireshark and see if packets arrive ?
Yes, we do. Will check on that and report back shortly. Thanks, very good suggestion.
OK, on a spare ec2 instance we opened a port in the firewall and ran tcpdump on that port. Tested with a computer (telnet to port number). Computer connected to router in question and no WAN cable present, all traffic routed over mobile broadband. Tcpdump works great shows initial connection, ack, etc.
WifiClient.connect() attempts to same server at same port show nothing (tcpdump displays no activity).
Same config all around except WAN cable plugged in, ESP8266 traffic works fine (and tcpdump shows packets arriving at server).
For reference, here is tcpdump output when succesful (over WAN):
sudo tcpdump -v port 50003
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
13:49:53.481628 IP (tos 0x20, ttl 46, id 57415, offset 0, flags [DF], proto TCP (6), length 60)
c-69-243-85-212.hsd1.dc.comcast.net.47854 > ip-170-13-30-40.ec2.internal.50003: Flags [S], cksum 0xd18c (correct), seq 4215953204, win 64240, options [mss 1460,sackOK,TS val 3135539140 ecr 0,nop,wscale 7], length 0
13:49:53.481684 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
ip-170-13-30-40.ec2.internal.50003 > c-69-243-85-212.hsd1.dc.comcast.net.47854: Flags [S.], cksum 0x6846 (incorrect -> 0x66f8), seq 3890534711, ack 4215953205, win 62643, options [mss 8961,sackOK,TS val 400194172 ecr 3135539140,nop,wscale 7], length 0
13:49:53.496891 IP (tos 0x20, ttl 46, id 57416, offset 0, flags [DF], proto TCP (6), length 52)
c-69-243-85-212.hsd1.dc.comcast.net.47854 > ip-170-13-30-40.ec2.internal.50003: Flags [.], cksum 0xa5c3 (correct), ack 1, win 502, options [nop,nop,TS val 3135539152 ecr 400194172], length 0
In the logs above, the cksum 0x6846 (incorrect -> 0x66f8)
from the server is strange.
So packets are indeed not getting out from the LTE router.
You could try a wireshark capture of a dumb TCP connection from a PC and from the ESP and see how they differentiate. The local receiver would do the capture.
Here are the internal tools that helped debugging networking in this core:
The Netdump internal facility Data will be dumped on serial port, but they can also be forwarded to a PC and let wireshark analyze them. https://github.com/esp8266/Arduino/blob/f4178e58dcfe32ec1f4b7d9cfb31e3ad5559327a/libraries/Netdump/examples/Netdump/Netdump.ino#L127
Host environment to execute an arduino sketch on your computer (Linux/macOS/WSL2) host. For example:
$ cd tests/host
$ make ../../libraries/ESP8266WiFi/examples/WiFiClient/WiFiClient
$ ./bin32/WiFiClient/WiFiClient
$ ./bin/WiFiClient/WiFiClient
Thanks, this all looks like helpful areas to explore. Much appreciated.
We did get part way through getting into promiscuous mode on laptop so we could do the wireshark pc versus ESP comparison. Will keep going down that road. And try the host environment stuff.
Small update: We can confirm that if we have a secondary WiFi router with a built in VPN client (out to a VPN sever on the public internet) connected to the ORBI (via ethernet LAN), the ESP is able to open a connection (and send data) without issue. So as long as we have a tunnel through the LTE mobile broadband uplink, all is fine.
Still working on the TCP packet comparisons to try and tease out what it is the ORBI doesn't like about (non-tunneled) ESP packets when preventing them from being routed over the mobile link.
@edwinnap any updates on this?
Nothing definitive yet. The VPN tunnel has been stable, so we have been a little slow on the packet analysis. We will keep at it though.
Maybe you can also try to find out what the max MTU is via this router? One description of how to do it is this Citrix article
Ah, that's a good idea, thank you. We did fiddle with MTU sizes in the ESP network code at one point, but it did not seem to have any effect. Should at least be able to tell if we can generate the same issue on something other than the ESP by changing the MTU size ...
ESP's lwIP is by default configured with IP fragment and reassembly options. The "no features" lwIP variant disables these two features. (I'm not saying there is no bug nowhere)
We have not made any further progress yet. We only had one user with this device/issue, so it was just easier to give then a VPN tunnel device and move this down the to do list :-(
On Jan 30, 2022, at 6:19 PM, Farzad @.***> wrote: Any updates on it? I am having this problem but there does not seem to be any documented workaround.
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.
We have not made any further progress yet. We only had one user with this device/issue, so it was just easier to give then a VPN tunnel device and move this down the to do list :-( … On Jan 30, 2022, at 6:19 PM, Farzad @.***> wrote: Any updates on it? I am having this problem but there does not seem to be any documented workaround. — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.
Never mind, it turned out I had another problem. Whenever I used mobile network, the server for some reason would decide to send me a compressed gzipped version. I solved the problem by explicitly setting the content-encoding to "identity"
Basic Infos
Platform
Settings in IDE
Problem Description
Strangest thing. Any connection attempt from WiFiClient.connect() to anywhere out on the public internet over a mobile broadband router is failing 100% of the time.
The router in question is an ORBI LBR20 which can route traffic either out over an ethernet WAN connection (if available) or over its LTE radio connection. The ESP8266 can connect fine to any host on the LAN side (10.0.0.x) and anywhere out on the public internet if the WAN connection is present. But when the WAN is not connected and traffic is being routed over the mobile broadband uplink, all connection attempts fail. Every other device on same network (laptops, computers, tablets, other IoT devices, etc., etc) can connect without issue.
Of course something in the router or on the mobile carrier's network (T-Mobile) must be dropping/rejecting the packets, but there must be something about the ESP8266/LWIP packets that the filtering is working off (?). Thought originally if might be related to 4593 (https://github.com/esp8266/Arduino/issues/4593) and MSS size. But have tried every variant of ip4, ip6, low memory, high bandwidth in 2.5.1, 2.5.2, 2.6.3, and 2.7.4. Result is always the same (we've been at this for a few days straight :-)
The router is very difficult to get into (would love to be able to configure the iptables chain to log dropped packets!). We even installed a second router with it's own layer of NAT and then tried to connect from that (the ORBI has a LAN port as well, second router's uplink is through that). Same issue; any device can route to the outside internet over mobile link except ESP8266 (!).
Sample code that illustrates the failure is included below, but of course only those with similar hardware might also see the problem. Just hoping someone might have some thoughts on what to try in LWIP code to tease out what the issue could be. In the meantime, we are going to try a second router with built in VPN to see if we can tunnel through whatever is stopping ESP8266 connections.
MCVE Sketch
Debug Messages