esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
16.09k stars 13.33k forks source link

Esp8266 IP Address not reachable after a while #2330

Closed thehellmaker closed 5 years ago

thehellmaker commented 8 years ago

Hi All, ESP abecomes unavailable after sometime intermittently where it says Connection to http://192.168.1.4:80 refused org.apache.http.conn.HttpHostConnectException: Connection to http://192.168.1.4:80 refused

The same happens from the web browser and then it starts working randomly. Checking the logs of the ESP device itself there is no crash.

Am i missing something in the setup of the server which can keep it alive all the time.

I thought this was a webserver problem but it seems like and ESP issue. Here is the related issue https://github.com/me-no-dev/ESPAsyncWebServer/issues/54

Cheers, Akash A

thehellmaker commented 8 years ago

A similar issue https://github.com/esp8266/Arduino/issues/1137 was reported on dec 2015

thehellmaker commented 8 years ago

A similar issue also reported in http://internetofhomethings.com/homethings/?p=426

thehellmaker commented 8 years ago

As mentioned in https://github.com/me-no-dev/ESPAsyncWebServer/issues/54 I have already tried the approach in the link http://www.esp8266.com/viewtopic.php?p=12809 and its still not working.

Now will analyse using wireshark myself

thehellmaker commented 8 years ago

Wireshark is recieving ARP broadcast from the module every second because of the fix.

Here is the packet content. 100 7.995514 Espressi_1a:66:47 Broadcast ARP 42 Gratuitous ARP for 192.168.1.6 (Request) Frame 100: 42 bytes on wire (336 bits), 42 bytes captured (336 bits) on interface 0 Ethernet II, Src: Espressi_1a:66:47 (5c:cf:7f:1a:66:47), Dst: Broadcast (ff:ff:ff:ff:ff:ff) Destination: Broadcast (ff:ff:ff:ff:ff:ff) Address: Broadcast (ff:ff:ff:ff:ff:ff) .... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default) .... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast) Source: Espressi_1a:66:47 (5c:cf:7f:1a:66:47) Address: Espressi_1a:66:47 (5c:cf:7f:1a:66:47) .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) .... ...0 .... .... .... .... = IG bit: Individual address (unicast) Type: ARP (0x0806) Address Resolution Protocol (request/gratuitous ARP) Hardware type: Ethernet (1) Protocol type: IPv4 (0x0800) Hardware size: 6 Protocol size: 4 Opcode: request (1) [Is gratuitous: True] Sender MAC address: Espressi_1a:66:47 (5c:cf:7f:1a:66:47) Sender IP address: 192.168.1.6 Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00) Target IP address: 192.168.1.6

thehellmaker commented 8 years ago

For a device to which ARP is responding here is the sequence

  1. Request 1732 208.713855 IntelCor_c5:37:30 Espressi_1a:66:47 ARP 42 Who has 192.168.1.6? Tell 192.168.1.7
  2. Response 1733 208.734013 Espressi_1a:66:47 IntelCor_c5:37:30 ARP 42 192.168.1.6 is at 5c:cf:7f:1a:66:47
  3. Request Body
Frame 1732: 42 bytes on wire (336 bits), 42 bytes captured (336 bits) on interface 0
    Interface id: 0 (\Device\NPF_{641ED2C7-4125-43D0-BEF1-205ACE40B627})
    Encapsulation type: Ethernet (1)
    Arrival Time: Jul 26, 2016 21:05:15.359512000 India Standard Time
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1469547315.359512000 seconds
    [Time delta from previous captured frame: 0.374458000 seconds]
    [Time delta from previous displayed frame: 0.374458000 seconds]
    [Time since reference or first frame: 208.713855000 seconds]
    Frame Number: 1732
    Frame Length: 42 bytes (336 bits)
    Capture Length: 42 bytes (336 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: eth:ethertype:arp]
    [Coloring Rule Name: ARP]
    [Coloring Rule String: arp]
Ethernet II, Src: IntelCor_c5:37:30 (18:5e:0f:c5:37:30), Dst: Espressi_1a:66:47 (5c:cf:7f:1a:66:47)
    Destination: Espressi_1a:66:47 (5c:cf:7f:1a:66:47)
        Address: Espressi_1a:66:47 (5c:cf:7f:1a:66:47)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Source: IntelCor_c5:37:30 (18:5e:0f:c5:37:30)
        Address: IntelCor_c5:37:30 (18:5e:0f:c5:37:30)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Type: ARP (0x0806)
Address Resolution Protocol (request)
    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: request (1)
    Sender MAC address: IntelCor_c5:37:30 (18:5e:0f:c5:37:30)
    Sender IP address: 192.168.1.7
    Target MAC address: Espressi_1a:66:47 (5c:cf:7f:1a:66:47)
    Target IP address: 192.168.1.6
  1. Response Body
Frame 1733: 42 bytes on wire (336 bits), 42 bytes captured (336 bits) on interface 0
    Interface id: 0 (\Device\NPF_{641ED2C7-4125-43D0-BEF1-205ACE40B627})
    Encapsulation type: Ethernet (1)
    Arrival Time: Jul 26, 2016 21:05:15.379670000 India Standard Time
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1469547315.379670000 seconds
    [Time delta from previous captured frame: 0.020158000 seconds]
    [Time delta from previous displayed frame: 0.020158000 seconds]
    [Time since reference or first frame: 208.734013000 seconds]
    Frame Number: 1733
    Frame Length: 42 bytes (336 bits)
    Capture Length: 42 bytes (336 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: eth:ethertype:arp]
    [Coloring Rule Name: ARP]
    [Coloring Rule String: arp]
Ethernet II, Src: Espressi_1a:66:47 (5c:cf:7f:1a:66:47), Dst: IntelCor_c5:37:30 (18:5e:0f:c5:37:30)
    Destination: IntelCor_c5:37:30 (18:5e:0f:c5:37:30)
        Address: IntelCor_c5:37:30 (18:5e:0f:c5:37:30)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Source: Espressi_1a:66:47 (5c:cf:7f:1a:66:47)
        Address: Espressi_1a:66:47 (5c:cf:7f:1a:66:47)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Type: ARP (0x0806)
Address Resolution Protocol (reply)
    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: reply (2)
    Sender MAC address: Espressi_1a:66:47 (5c:cf:7f:1a:66:47)
    Sender IP address: 192.168.1.6
    Target MAC address: IntelCor_c5:37:30 (18:5e:0f:c5:37:30)
    Target IP address: 192.168.1.7
thehellmaker commented 8 years ago

Found a very interesting thing. I am using a Windows 7 OS to debug this issue and here are the findings

  1. ESP is responding to ARP queries where destination is the ESP MAC address 1918 151.364565 IntelCor_c5:37:30 Espressi_1a:66:47 ARP 42 Who has 192.168.1.6? Tell 192.168.1.7 1919 151.371335 Espressi_1a:66:47 IntelCor_c5:37:30 ARP 42 192.168.1.6 is at 5c:cf:7f:1a:66:47
  2. ESP is not responding to broadcast ARP pings using nmap. 3459 254.010073 IntelCor_c5:37:30 Broadcast ARP 42 Who has 192.168.1.6? Tell 192.168.1.7

I will look into the arp query responder code in the codebase

thehellmaker commented 8 years ago

Looks like the arp requests are completely handled by lwIP project which is what this project is depenent on. @me-no-dev looks like you imported the project as dependency 4 months back. And i did a diff with the latest version of the project 1.4.1 of lwIP and seems like some broaddcast functionality was added which is not there in the version imported. Did you import the latest version ?

me-no-dev commented 8 years ago

lwip comes from espressif and not me :) I just tweaked some stuff here and there (not broadcast but multicast). Latest lwip is wip :)

thehellmaker commented 8 years ago

Upgrade to open source Lwip(1.4.1) from 1.3.2 port as suggested by @igrr The module is still responding to ARP requests.. Waiting to see if it stops.

igrr commented 8 years ago

Can you make a diff between 1.3.2 and 1.4.1 in the part which deals with ARP? Maybe we can backport the fix instead of updating all of lwip for now.

thehellmaker commented 8 years ago

Stopped responding to ARP requests on 1.4.1 as well. The gratuitous ARP that is being sent is not being handled by android devices. Deep diving into the code base to debug further.

51056 1693.352539 Espressi_88:7f:7e Broadcast ARP 42 Gratuitous ARP for 192.168.1.12 (Request) Frame 51056: 42 bytes on wire (336 bits), 42 bytes captured (336 bits) on interface 0 Ethernet II, Src: Espressi_88:7f:7e (5c:cf:7f:88:7f:7e), Dst: Broadcast (ff:ff:ff:ff:ff:ff) Address Resolution Protocol (request/gratuitous ARP)

thehellmaker commented 8 years ago

How ever restarting the module takes a new ip address and it starts responding to ARP requests.

thehellmaker commented 8 years ago

Now I am seeing that the IP Address is also in use by another device which is obvious as ESP didn't respond to ARP request. But ESP has been sending gratuitous ARP and here is the wire shark capture

5495591 37123.051429 00:e1:40:46:09:6c Broadcast ARP 42 Gratuitous ARP for 192.168.1.5 (Request) (duplicate use of 192.168.1.5 detected!)

thehellmaker commented 8 years ago

This is not an issue with ARP as most people have pointed out. This has something to do with the wireless connectivity stability.

I see debug logs right after the module stops responding to ARP saying wifi evt: 7 add 1 aid 1 station: 40:88:05:b1:29:eb join, AID = 1 wifi evt: 5 wifi evt: 7 bcn_timout,ap_probe_send_start

This seems to be the root cause. I have attached the full log here. https://drive.google.com/open?id=0B8DXcb9GfNuARFZGdy1USGNPbFk

thehellmaker commented 8 years ago

Attaching Enums that the event numbers point to https://github.com/esp8266/Arduino/blob/db5e20f23770e1be307348633dc497f689493996/tools/sdk/include/user_interface.h#L368 https://github.com/esp8266/Arduino/blob/de166c9dd73bd1da0baa35b2a62695035196018a/libraries/ESP8266WiFi/src/ESP8266WiFiType.h#L51

Both map to same enum values..

mtnbrit commented 8 years ago

What make, model and firmware is your AP? Have you tried a different brand or model of wifi AP? they are not all equal by far.

On Aug 12, 2016, at 10:46 AM, Akash Ashok notifications@github.com wrote:

This is not an issue with ARP as most people have pointed out. This has something to do with the wireless connectivity stability.

I see debug logs right after the module stops responding to ARP saying wifi evt: 7 add 1 aid 1 station: 40:88:05:b1:29:eb join, AID = 1 wifi evt: 5 wifi evt: 7 bcn_timout,ap_probe_send_start

This seems to be the root cause. I have attached the full log here. https://drive.google.com/open?id=0B8DXcb9GfNuARW54YWFsVHhJbnc https://drive.google.com/open?id=0B8DXcb9GfNuARW54YWFsVHhJbnc — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/esp8266/Arduino/issues/2330#issuecomment-239513382, or mute the thread https://github.com/notifications/unsubscribe-auth/AKy2zsZ_DPhgEpNQL7VhxiU-lqHxkFeCks5qfLF8gaJpZM4JVNL1.

ClaudioHutte commented 8 years ago

So you think the wifi connectivity instability lead to the inability of responding ARP broadcast requests? A reception problem since transmission seems to be ok, correct?

thehellmaker commented 8 years ago

@ClaudioHutte You are partially right. Here are my observations

  1. Reciever seems to be mainly affected because post this gratuitous ARP from other module is not being recieved as well but Gratuitous ARP is being sent to other modules though
  2. How ever if you see the log below it seems like... The module tries to rejoin it gets a wifi evt: 5 which is connected post which it recieves the Gratuitous ARp from other modules for just a few seconds post which it disconnects with
err already associed!
station: 98:0c:a5:b8:de:91 leave, AID = 1

Log

add 1
aid 1
station: 98:0c:a5:b8:de:91 join, AID = 1
wifi evt: 5
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
Got ARP Input 
nHere for arpwifi evt: 7
Got ARP Input 
nHere for arpGot ARP Input 
nHere for arpwifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
Got ARP Input 
nHere for arpGot ARP Input 
nHere for arpGot ARP Input 
nwifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
Got ARP Input 
nHere for arpGot ARP Input 
nHere for arpGot ARP Input 
nHere for arpwifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
Got ARP Input 
nHere for arpwifi evt: 7
Got ARP Input 
nHere for arpGot ARP Input 
nHere for arpwifi evt: 7
Got ARP Input 
nHere for arpwifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
err already associed!
station: 98:0c:a5:b8:de:91 leave, AID = 1
rm 1
wifi evt: 6
add 1
aid 1
station: 98:0c:a5:b8:de:91 join, AID = 1
  1. somewhere between multiple join and leave attempts you'll also see max connection!
  2. And ofcourse a bunch of
bcn_timout,ap_probe_send_start
bcn_timout,ap_probe_send_start

Just to explain my setup I have 2 esp8266 12 f modules http://www.thaieasyelec.com/products/wireless-modules/wifi-modules/esp8266-12f-wifi-serial-transceiver-module-detail.html

  1. I setup gratuitous ARP to send arp broadcast pings into the network every second as @ClaudioHutte pointed out in the beginning
  2. When the modules connect for the first time every second the module prints
Got ARP Input 
nHere for arp
  1. Along with this ARP recieve there are other wifi events.
  2. At some point the logs mentioned in point 2 stop. (After close to 48 hours) and there are a bunch of other events which happen before this terminates.

I have attached the complete log to the link https://drive.google.com/file/d/0B8DXcb9GfNuARFZGdy1USGNPbFk/view

ClaudioHutte commented 8 years ago

I never tested two units as you done, though I incurred into the same troubles with ESP8266-12 and a TP-link router located quite far (two stories below mine). I would like to do some tests the same way you've done, but I will be busy into other works for the next two weeks. What happens if the "every second gratuitous ARP send" workaround is stopped/skipped?

thehellmaker commented 8 years ago

Before you mentioned about the gratuitous ARP i hadn't added it into the code base. Even then the module stopped responding like we discussed here https://github.com/me-no-dev/ESPAsyncWebServer/issues/54

But i haven't collected the logs without Gratuitous ARP but I'm sure its the same issue though.

For the module to eventually stop responding it always take 36 hours + .

alex-yazdan commented 7 years ago

By upgrading to SDK 2.1.0 #3215 this problem will be solved.

pouriap commented 7 years ago

Thanks for the efforts to create the update_sdk_2.1.0 branch.

But I'm still having the ARP issue even when using that branch:

esp8266_arp

Can anyone confirm that their ARP issue has been resolved by using that branch?

IvanBayan commented 7 years ago

I tried new sdk and still have arp issue.

vks007 commented 7 years ago

I also have this same issue. I am using a webserver on the ESP which connects to my router in the STA mode. The router assigns a fix IP to the ESP (192.168.1.54). All works good but after some time (typically a few hours to a day) the ESP webserver stops responding. I tried pinging the IP address at this point and its unreachable. To see memory footprint I added log calls within the ESP which calls a googlesheet URL and logs all relevant info. All that keeps working fine. Memory foot print is also normal. So while my ESP is able to reach the internet it's IP address is not reachable from within the local network. If I reset my ESP or turn my modem ON/OFF (to again assign the IP address) the issue goes away for a few hours. I have tried the simple webserver from the examples and it behaves the same way so my program is not what is causing this. I have also tried this on a SONOFF,Electrodragon ESP relay module, ESP 01 module , ESP 12E module - they all behave the same. Can somebody guide me on what should I be looking at here,

mtnbrit commented 7 years ago

Can you ping the ESP from the router itself?

It looks like the arp issue.

Have you eliminated the access-point/router as being the cause by trying a different make/model?

On Jun 25, 2017, at 2:21 AM, Vijay notifications@github.com wrote:

I also have this same issue. I am using a webserver on the ESP which connects to my router in the STA mode. The router assigns a fix IP to the ESP (192.168.1.54). All works good but after some time (typically a few hours to a day) the ESP webserver stops responding. I tried pinging the IP address at this point and its unreachable. To see memory footprint I added log calls within the ESP which calls a googlesheet URL and logs all relevant info. All that keeps working fine. Memory foot print is also normal. So while my ESP is able to reach the internet it's IP address is not reachable from within the local network. If I reset my ESP or turn my modem ON/OFF (to again assign the IP address) the issue goes away for a few hours. I have tried the simple webserver from the examples and it behaves the same way so my program is not what is causing this. I have also tried this on a SONOFF,Electrodragon ESP relay module, ESP 01 module , ESP 12E module - they all behave the same. Can somebody guide me on what should I be looking at here,

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/esp8266/Arduino/issues/2330#issuecomment-310891995, or mute the thread https://github.com/notifications/unsubscribe-auth/AKy2zuT0ObFwq_Fiv_pkoVam4IsGv3hIks5sHiadgaJpZM4JVNL1.

vks007 commented 7 years ago

Hi @mtnbrit , I am in the process of trying out a different router, will a few days to verify this. Also, while the above condition happened some times, with my testing since yday most of the times, I am able to ping the ESP while the webserver does not respond. At other times, I am sometimes able to get a response from my iPhone browser while the desktop browser throws - connection reset error. I also, started recycling the webserver every 10 min (I mean recreate the webserver object via reset method) and I was still able to get into a state where the ESP behaves normally but webserver stops responding. Is there someway I can debug the state of the webserver object, I can log that and figure out what is not responding. Maybe I can tweak the source files and read some members and log them, if they are private, make them public just for this purpose. But I am not sure what things would tell me something about the webserver object.

vks007 commented 7 years ago

An update on my issue: I am not sure how does ARP work and what can be done about it but I changed by router and my ESP has been working perfect since then. Its been 4 days and it has not been unreachable even once. So there is something with my existing router that causes it to be unreachable. What a waste of time this has been for me. I have been trying to implement a solution to this for many weeks now. Phew!

mikrodunya commented 7 years ago

Hi all. I have the same issue with ESP8266 webserver example. I can see ESP8266 MAC address on modem's page but i can't see it's IP address. Also i cant reach ESP8266's IP over browser. How can i solve this problem? Thanks.

vks007 commented 7 years ago

Hi @mikrodunya , I would suggest you try another router and see if the issue persists. If you dont have one, loan it form a friend for a day or two :) . It is certianly a ARP issue with the router.

mikrodunya commented 7 years ago

I am suspected my router too. I will try it on another router. Thanks.

lexelby commented 7 years ago

I have seen improvement by having my esp devices connect to an access point running on my computer using hostapd.

That said, I still maintain that we can't just take the easy road here and blame the router. only esp8266 devices connected to my router exhibit any arp issues.

lexelby commented 7 years ago

... And by "still maintain", I mean that I mentioned that in another similar issue here -- I forget which one. :)

devyte commented 7 years ago

All, does PR #3362 help with this?

pouriap commented 7 years ago

I completely fail to understand why this issue is not being addressed. Is it a rare thing? Is my ESP device the problem? Will buying another one solve this? I have been waiting for a fix for almost a YEAR now. Struggling with this completely broken device. It simply does not answer ARP requests. I cannot reach it half of the time. I have to refresh my browser for minutes until it finally responds. You cannot blame the router for this because there are like 10 wifi devices in my house connected to this router all the time and none of them have ever had a similar problem. I've tried many solutions during this one year but none of them work. The ESP just stops responding to ARP and it's completely random. Sometimes right after I turn it on it will be unresponsive. Sometimes it will take a while. 1- How can this be my router's fault when all other devices are working fine? 2- If it is not my router's fault then how is it possible that apparently very few people are having this issue? Because apparently no one even cares/knows that this issue exists.

d-a-v commented 7 years ago

@pouriap PR #3362 is an update to lwip-v2. Could you try it and report here ?

pouriap commented 7 years ago

@d-a-v Thank you for answering. I'm not sure how I should use a pull request. I searched your repositories and found your lwip2v2 branch. Is that what I'm supposed to download? (BTW am I doing this right? Shouldn't there have been a link to it in the pull request? It took me a while to find it). Also, I already have the esp8266 v2.3.0 board installed via the boards manager. Do I have to build/make anything? Or do I just copy your branch to the esp8266 folder and overwrite it?

d-a-v commented 7 years ago

Cloning my lwip2v2 branch would do it for a try.

d-a-v commented 7 years ago

But if you are using v2.3.0 from board manager, you should try current master or v2.4.0rc2 first (link)

pouriap commented 7 years ago

Thanks @d-a-v Just flashed with v2.4.0rc2 with no luck. Became unresponsive after ~1 hour. Going to try with your lwip2v2 branch.

mtnbrit commented 7 years ago

It might be useful to start building a list of access points that have this issue, would you care to report the make model and firmware version? Have you tried a different AP?

On Oct 22, 2017, at 2:12 AM, Pouria Pirhadi notifications@github.com wrote:

Thanks @d-a-v Just flashed with v2.4.0rc2 with no luck. Became unresponsive after ~1 hour. Going to try with your lwip2v2 branch.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

pouriap commented 7 years ago

@mtnbrit That's exactly what I was thinking about today. I thought maybe someone would be able to see a pattern in the routers and figure out the problem. We should post our WiFi settings as well. Like encryption method, etc. I'm not familiar with github workflow though, so I'm not sure how/where we should create this list. Here is my router model and settings: (I think for people with publicly accessible IPs posting their firmware version could be a security issue)

Huawei D-100 3G/4G Desktop Modem Firmware version: 01.01.02.082 Radio Channel: Auto Working Mode: 802.11b/g/n Bandwidth: 20M/40M RTS Threshold: 2347 CTS Protection Mode: Auto Preamble Length: Short Preamble SSID Broadcast: Enable Authentication: WPA/WPA2-Personal Mixed Mode Encryption: TKIP/AES

Though the Huawei D-100 that comes up on the internet is not the one I have. I think it might be a model specific to my ISP? (Irancell)

I have not tried a different AP because it would be a useless thing to do. Even if it worked with another router(which I suspect it would given how rare this issue is), I can't afford to buy a new router just because the ESP doesn't work with my current one. Also I'm pretty sure my ISP wouldn't let me use 3rd party hardware. I guess I'll just have to keep refreshing until it responds, unless the cure is found. 😞

Speaking of which, I think the lwip2v2 isn't fixing it either. I flashed it this afternoon and it has gone into brief periods of unresponsiveness. Its behavior is always inconsistent so I cannot tell for sure until a few days have passed. But my intuition and past experiences tell me that it's going to go fully unresponsive by tomorrow or in a few days.

jogyl commented 7 years ago

@mtnbrit I have the same issue as @pouriap and others (but mine are all gone once the webserver stops responding) . My webbserver stops responding after a couple of minutes but it is still responsive using serial and UDP broadcasts. I have tried 2.4.0-rc2 down to 2.2.0 with the same results. I am running 4 Node MCU ESP-12 that goes blackout regardless of how they are powered (USB or local PSU).

I am running 2 Unify access points in mesh with a Unify router (EdgeRouter PoE v1.9.7+hotfix.4): model: UAP version: 3.9.3.7537 radio: 802.11b/g/n channel: 6 and 11

I will setup a second wifi using some other Netgear router and post back my results.

vks007 commented 7 years ago

@jogyl , I have tried every possible combination for my ESPs , powering it differently etc but I was always able to get this issue with my DLink router. While everything works fine with my TP Link router. I have been running on my TP Link router for many months now and I have never had any issues. One important test I did was to reach out to the internet form the ESP. My ESP was able to continuously reach the internet while it was not reachable from the local network. This indicates that the issue isnt with the web server but something between the ESP and the router. I havent tried tweaking the DLink router settings to see if some setting solves the issue with the ESP - its too time consuming to try each setting for a couple of hours. Hi @mtnbrit , to answer your question about list of AP, model, firmware - I have used various kinds of ESP available in the market loaded with the webserver example and they all had this issue. I have used ESP 01, ESP 12E, Electrodragon ESP relay, SONOFF. I don't have firmware version at hand but i assume they must be different to some extent. And the beauty is that they all work perfectly fine with my TP Link router.

jogyl commented 7 years ago

@vks007, ok. You have problems using some DLink router, @pouriap is on a Huawei and I am using products from Ubiquiti and there are others in other forums. It seems like our devices still can send data (you using TCP connecting to an external service and me using UDP). Our devices stops responding to networks request (regardless of where in the network stack the problem is located).

The solution cannot be that we all get TP Link routers, can we at least agree upon that there seems to be some problem and try to work to isolate it?

DLink, Huawei and Ubiquity are all pretty large companies, going for that there is some special way that these companies have implemented their networking is what kills the ESPs communication does not feel right. Is there some way we can do structured testing using same versions of the firmware and using the same sketch etc? @igrr and co, is there any way we can help out and give better feedback?

Maybe start tweeting on #my_esp_too

pouriap commented 7 years ago

@d-a-v I can confirm now that unfortunately the lwip2v2 is not fixing the issue.

arp

FYI this has been captured from a third machine, not the one sending the requests. So the router is actually sending the broadcast across the network but the ESP isn't responding to it.

pouriap commented 7 years ago

A question: Do your ESPs respond after a while? Because mine does respond after I keep pinging it(or keep hitting the reload button in the browser). Sometimes it takes 30 seconds, sometimes one minute, sometimes more, but it usually does respond eventually. In the first image in one of my earlier posts you can see the ESP responding eventually in the highlighted row, but it has taken it one minute to do so. Do yours do the same thing?

jp112sdl commented 7 years ago

@pouriap I can confirm the behaviour you described.

My ESPs always responds to a ping, when I keep pinging it continously.

If I stop pinging and waiting for a while (don't know exactly how long... a few hours or so), the ESP is unreachable for the first 10 or 20 pings. Then the ESPs answers with a very high response time (> 300ms) for some replies before it is getting to a normal level.

While ping is not available, the webserver running on the ESP is not responding, too.

FYI: I am using an Apple TimeCapsule as AP.

Now I'm pinging my ESPs with nagios every 5 minutes. Seems to keep them alive.

mikrodunya commented 7 years ago

Same problem here.

jogyl commented 7 years ago

@pouriap, my ESPs allways stops respoding after little less then 2 minutes and they never wake up. They are still broadcasting their UPD presence with no interrupts.

@jp112sdl, I tried your ping-thing but it made no difference on my network. After they stop responding I get 15 "...unreachable" and 3 "...timeout" over an over (ping -t in Windows).

@vks007, I have set up another network (a Windows 10 laptop as mobile hotspot) and connected an ESP to that network. It has been responding fine for a couple of hours so the error is indeed network dependent.

(I created a small logg app that listens to my ESPs UDP broadcasts and does a web request against them and logs the result so I can get some statistics)

I am not good at packet logging but if there is anything I can setup to log and compare now that I have two networks with the same sketches and firmware running on two different networks with very dramatically different results, I'd be happy to help out...?

pouriap commented 7 years ago

So a very curious thing just happened. I was trying to manually send an ARP request to the ESP, only I couldn't! It does not answer to my manual ARP requests at all. And the only reason it works after a restart is because when it is starting up it sends a gratious ARP and my computer learns it's MAC address. I'm confused, because when I do the manual ARP no matter how long I keep requesting, it does not answer. But when I ping it, it does answer eventually. Now the interesting part: I have another ESP, and this other ESP does respond to my manual ARP requests! I'm flashing the same code into them. The example web server.

Can you guys send manual ARP requests to your device and see if it responds?

You can do it using the Windows utility arp-ping (Available here): arp-ping IP-OF-ESP

Or using the Linux utility arping: arping IP-OF-ESP

Or using the nping utility, which is part of nmap. (Available here for Linux and Windows): nping --arp arp --arp-target-ip IP-OF-ESP IP-OF-ROUTER nping --arp IP-OF-ESP

As far as I can remember this other ESP also goes unresponsive after a while, but now I'm not so sure. I'm going to leave it on for a few days to see if it goes unresponsive.