arendst / Tasmota

Alternative firmware for ESP8266 and ESP32 based devices with easy configuration using webUI, OTA updates, automation using timers or rules, expandability and entirely local control over MQTT, HTTP, Serial or KNX. Full documentation at
https://tasmota.github.io/docs
GNU General Public License v3.0
22.14k stars 4.79k forks source link

Device not responding to ARP requests causing network issues #1553

Closed SupraJames closed 6 years ago

SupraJames commented 6 years ago

This is related to the stability of the T1 devices I have been testing, and may have been clouding the issue.

SYMPTOMS: "Weird" network issues. Unable to contact / ping IP address of Tasmota from PC or Mac, but MQTT and/or syslog connection to server on same network is still working fine. Ping from MQTT server is fine.

FINDINGS: If a machine has not connected to Tasmota in a while it needs to map it's IP address to a MAC address using an ARP request. I have found that after an unspecified amount of time, the Tasmota does not respond to these ARP requests, so any network access will fail. The MQTT server is in constant contact so the MAC address is already in the ARP cache so it doesn't need to look it up.

I have used a packet sniffer to verify that ARP requests never come back when the problem is encountered. I then issue an MQTT command to Restart the Tasmota, and everything is OK again, and I can see the ARP request and reply.

This issue has been encountered on both my T1's, one with the official binary and one with my own compiled binary against Arduino core v2.4.0.

See traces attached. My PC is .190 and the Tasmota is .152.

bad capture

After rebooting Tasmota

good capture

arendst commented 6 years ago

You might want to follow this issue https://github.com/esp8266/Arduino/issues/3095 as it's out of my hands providing a solution.

SupraJames commented 6 years ago

Oh, yuck! What a rabbit hole. OK, fair enough. Guess I'm going to be digging out a spare router tonight and having a play. Nothing's ever simple :)

SGF-Lon commented 6 years ago

Supra.....that matches what I'm seeing. I have been able to get it reconnect after restarting - but then after awhile it goes unresponsive again. No change in behaviour when restarting wifi or webserver. All other devices are fine. So, it would tell me that something in the underlying TCP/IP code doesn't sit well with the T1 device.

On Tue, 9 Jan 2018, at 16:30, SupraJames wrote:

Oh, yuck! What a rabbit hole. OK, fair enough. Guess I'm going to be digging out a spare router tonight and having a play. Nothing's ever simple :)> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub[1], or mute the thread[2].>

Links:

  1. https://github.com/arendst/Sonoff-Tasmota/issues/1553#issuecomment-356337335
  2. https://github.com/notifications/unsubscribe-auth/AhXZMZdaUW3pvnvxWYxHP2MjM9TcAZQAks5tI5Q5gaJpZM4RYEbk
SupraJames commented 6 years ago

Yeah, I was wondering about your issue. It seems to be something fundamental with the IP stack on ESP, and I actually just saw the same thing on my Basic module, so it's not limited to T1 / ESP8285 devices either.

What router are you using, out of interest? And can you verify using Wireshark or similar that ARP requests are going unanswered when you encounter the issue?

I'm using a TP-LINK Archer C7 router with stock firmware.

SupraJames commented 6 years ago

My workaround is to run a python script which answers the ARP request on behalf of the broken device:

Use at your own risk :)

https://gist.github.com/SupraJames/779475fefb6dfe7af315a68f03fe63dd

mateuszdrab commented 6 years ago

Hey guys, just found this. Suffering the same issue with random sonoffs going offline for an unspecified amount of time. Offline means unpingable. In my case also MQTT works fine. I will work on workarounds later... perhaps increasing arp cache timeout on my pfsense will do it because the sonoffs are on their own subnet.

mateuszdrab commented 6 years ago

@SupraJames Just modded your script to allow macDict to be populated from a json array (have about 30 esp devices) and to better bind on dual nic systems (original script didn't seem to send the replies on the wire)... I can see mac responses handed out now on packet capture. Leaving it for the night, lets see if it stops my ping alerts from SCOM.

mateuszdrab commented 6 years ago

Yay! I can confirm that running this script as a systemd service stops my ping alert issues.

Now, I hope that Espressif actually investigate it. I've also discovered a bunch of other issues with the ESP8266 networking:

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 6 years ago

This issue will be auto-closed because there hasn't been any activity for a few months. Feel free to open a new one if you still experience this problem.