Aircoookie / WLED

Control WS2812B and many more types of digital RGB LEDs with an ESP8266 or ESP32 over WiFi!
https://kno.wled.ge
MIT License
14.64k stars 3.15k forks source link

MDNS/Zeroconfig Stops Working #2009

Closed d8ahazard closed 2 years ago

d8ahazard commented 3 years ago

Describe the bug After being on for "some time", devices stop responding to MDNS, and will not respond again until being rebooted. Web UI and UDP control still works.

To Reproduce Boot device, connect device to wifi network.

Leave it sit for a few days.

Try MDNS discovery. No response.

Reboot. Discovery works again.

Expected behavior Consistend replies to MDNS queries, even after weeks of uptime.

WLED version Wemos D1 Mini 0.12.0 Downloaded from releases on github.

Additional context Anything else you'd like to say about the problem?

Thank you for your help!

huggy-d1 commented 3 years ago

Some WLED users have had success by periodically scheduling some always-on (hopefully low-power) device to ping WLED's IP address.

This type of problem is difficult to reproduce unless someone has the same networking hardware, distance from wifi router or AP, or obstacles between WLED's ESP and router/AP.

If you are interested, there are some interesting developments over at https://quinled.info around using ESP32 & ethernet instead of wifi.

If you have a home automation system, you could schedule a ping to that WLED IP address and look for the response. If there is no response after some number of pings, have it cycle power to WLED for a few seconds so it automatically restarts. If you have confidence it will last at least 2 days between power cycles, you could install a timer switch and schedule it to loose power every day at noon and power back up ~10m later.

blazoncek commented 3 years ago

I know MDNS is nice and all but would still like to know a use case why this is an issue? WLED, since 0.12, has auto node discovery (available using UI) or you can use WLED iOS/Android app.

On a troubleshooting side I haven't noticed this behaviour, though I'm running mine for over 30+ days without reboot.

d8ahazard commented 3 years ago

Some WLED users have had success by periodically scheduling some always-on (hopefully low-power) device to ping WLED's IP address.

This type of problem is difficult to reproduce unless someone has the same networking hardware, distance from wifi router or AP, or obstacles between WLED's ESP and router/AP.

If you are interested, there are some interesting developments over at https://quinled.info around using ESP32 & ethernet instead of wifi.

If you have a home automation system, you could schedule a ping to that WLED IP address and look for the response. If there is no response after some number of pings, have it cycle power to WLED for a few seconds so it automatically restarts. If you have confidence it will last at least 2 days between power cycles, you could install a timer switch and schedule it to loose power every day at noon and power back up ~10m later.

I totally understand what you're saying about being difficult to diagnose...but it sounds like "the team" is already aware of the issue...and that it's just hard to reproduce?

While not impossible for me...setting up "something" that has to regularly ping my WLED devices and tell them to reboot if their discovery service has failed sounds...hacky. If we know that MDNS can sometimes go stupid intermittently without much other rhyme or reason...why not do something in WLED itself to periodically restart the MDNS advertisement?

Seems like - whether or not it's a regular failure - a regular restart of the MDNS service every day or two would be the "right" fix...short of finding what's killing/freezing/breaking it.

I know MDNS is nice and all but would still like to know a use case why this is an issue? WLED, since 0.12, has auto node discovery (available using UI) or you can use WLED iOS/Android app.

On a troubleshooting side I haven't noticed this behaviour, though I'm running mine for over 30+ days without reboot.

I have an app that streams color data to one or more WLED devices via UDP over the network...I use MDNS to discover the WLED instances.

And...my app also has a companion mobile app that is just a modded version of the WLED mobile app...and AFAIK, all that is using for discovery is MDNS. I wold assume that any "node-to-node" discovery would also leverage MDNS to find other devices.

If not - then what else can I do to discover WLED devices on the LAN? If there's another way to discover them all, I'm all for it.

d8ahazard commented 3 years ago

After a brief glance through the code, it looks like some kind of loop/timer would just need to be set up in wled.cpp that runs MDNS.begin(... every once in a while...

d8ahazard commented 3 years ago

Continuing my thought...I would think a loop like this would work to keep restarting the service...

void WLED::handleMdns()
{
  if (millis() - lastMdnsConnect > 86400000) { // Every 24 hours
    if (strlen(cmDNS) > 0) {
      MDNS.end();
      #ifndef WLED_DISABLE_OTA
        if (!aOtaEnabled) { //ArduinoOTA begins mDNS for us if enabled
          MDNS.begin(cmDNS);
          lastMdnsConnect = millis();
        }
      #else
        MDNS.begin(cmDNS);
        lastMdnsConnect = millis();
      #endif
       DEBUG_PRINTLN(F("mDNS (re)started"));
      MDNS.addService("http", "tcp", 80);
      MDNS.addService("wled", "tcp", 80);
      MDNS.addServiceTxt("wled", "tcp", "mac", escapedMac.c_str());
      }
    }
  }
}

And then also add lastMdnsConnect = millis(); at line 404 in wled.cpp

However, looking more at the code, I'm curious why MDNS.update(); is only called under #ifdef ESP8266 at line 98. My C-foo is not the greatest, but shouldn't this be called regularly?

blazoncek commented 3 years ago

Screenshot 2021-06-03 at 20 26 42 Screenshot 2021-06-03 at 20 27 03 23 days of uptime and still discoverable. It must be something in your network...

d8ahazard commented 3 years ago

Screenshot 2021-06-03 at 20 26 42 Screenshot 2021-06-03 at 20 27 03 23 days of uptime and still discoverable. It must be something in your network...

I see what you mean. _wled._tcp is still MDNS discovery, however.

And, when I currently look on my app at the same thing, I'm only seeing three of the six or seven devices I have, and none have been unplugged.

While it may be something with my network, I submitted this issue because another of the testers for my software verified the same thing. Several devices of his, long-running, can't be discovered by my app or "Bonjour Browser" or commandline in Linux. Rebooting them fixes the issue.

So, I still submit that a small loop that just restarts the MDNS service would probably be a good way to fix this...

d8ahazard commented 3 years ago

One more update - since my last post, I've not rebooted any devices, and presently, none of them are discoverable via MDNS, neither with my application, or using the node list in the WLED app...

Aircoookie commented 3 years ago

@d8ahazard sorry for the issue! There should be no need to restart the MDNS service periodically, as it is re-started after each WiFi reconnect (so losing WiFi temporarily shouldn't be a problem) and MDNS.update() is called in every loop 🤔

What are you using to discover the devices? If you are using the WLED app, you could try if the board is discovered with a different app (I like Service Browser for Android)

d8ahazard commented 3 years ago

@d8ahazard sorry for the issue! There should be no need to restart the MDNS service periodically, as it is re-started after each WiFi reconnect (so losing WiFi temporarily shouldn't be a problem) and MDNS.update() is called in every loop 🤔

What are you using to discover the devices? If you are using the WLED app, you could try if the board is discovered with a different app (I like Service Browser for Android)

My primary method of discovery is using the "Makaretu.Dns.Multicast" library for .net. It's a pretty basic setup, and works fine for other devices that use MDNS (Nanoleaf, Hue) using the exact same methodology to discover them...I'm just changing the service/domain name I'm searching for.

But, to further troubleshoot the issue, I also use "Bonjour Browser" on windows, and have tried the "avahi-browse -a" command on RasPi. When the issue occurs, none of these methods will return the devices that are acting up.

One thought - when the devices are "Live"/streaming - does the MDNS responder still work? I'm suddenly wondering if they can't be discovered when streaming over UDP...which is primarily how I use them with my application...

Aircoookie commented 3 years ago

You very likely found the issue here!

Realtime/streaming mode disables some stuff that either would not work with it active (rendering built-in effects) or might cause performance problems (ArduinoOTA, among others).

I thought of mDNS and put it outside the block that is disabled in realtime mode. However, in the case ArduinoOTA is enabled, the MDNS library is not directly interfaced, rather I let ArduinoOTA do it for me (it relies on mDNS as well and adds its _arduino._tcp service type). When ArduinoOTA is disabled in realtime mode, mDNS goes with it. I will try to find a good solution for this :) as a workaround, you can disable ArduinoOTA in Security settings and reboot, which will make WLED call MDNS.begin() directly and thus potentially avoid the problem.

Keyes commented 3 years ago

Hey there, I have pretty much the same issue: I have a Raspberry Pi connecting to WLED via UDP to connect it to Apple HomeKit, and it all works pretty much fine, but after a day and a half: boom, gone. WLED vanishes from all bonjour/mDNS lists, isn't accessible via *.local address in the network, and needs to be rebooted manually or via IP address. :(

d8ahazard commented 3 years ago

Also, wanted to add that I disabled ArduinoOTA on all of my devices, and this issue still persists...

Keyes commented 3 years ago

Hey there, I have pretty much the same issue: I have a Raspberry Pi connecting to WLED via UDP to connect it to Apple HomeKit, and it all works pretty much fine, but after a day and a half: boom, gone. WLED vanishes from all bonjour/mDNS lists, isn't accessible via *.local address in the network, and needs to be rebooted manually or via IP address. :(

Right now fixed both this and another issue with using npm bonjour - yes, pretty old package, but still works well with Nodejs14. The Node app now looks up the WLED IP from it's hostname, and uses that until restart

fyi: The other issue was that the LEDs randomly turned off for a few seconds and on again for another few seconds, which was pretty annoying when you're using them as a main room illumination - my idea was that maybe because of the mDNS issue some UDP packages are lost - which apparently exactly was the case. Now the LED's work like a charm :)

pbolduc commented 3 years ago

this is a similar issue: https://github.com/espressif/arduino-esp32/issues/4406 if you look at the last comment,

At least in my case, using Arduino-ESP32 2.0.0-alpha1 with a Fedora 32 x86_64 workstation, I have verified (using Wireshark) that MDNS queries made using avahi-daemon from the Fedora machine are actually answered by the ESP32 device, but avahi-daemon rejects the response because it is considered "invalid" due to having an echoed question record in the response packet. The end result is that the device "drops out" from being locatable via MDNS. I have opened a bug report https://github.com/lathiat/avahi/issues/348.

blazoncek commented 3 years ago

Nice find.

d8ahazard commented 3 years ago

So, looks like this was fixed version 4.2? Does that mean that just building a new version of WLED using the latest MDNS libraries would fix this? That was published in July...

https://github.com/espressif/esp-idf/issues/7124

satrik commented 3 years ago

hi everyone, I think I've found a fix (at least a workaround). It seems that calling MDNS.begin() multiple times, needs always a MDNS.end() between and as the ArduinoOTA.h does it's own mdns stuff, I've just suppressed the service from there

d8ahazard commented 3 years ago

hi everyone, I think I've found a fix (at least a workaround). It seems that calling MDNS.begin() multiple times, needs always a MDNS.end() between and as the ArduinoOTA.h does it's own mdns stuff, I've just suppressed the service from there

Just curious - did you try building with the latest espressif libraries? Looks like a fix was published already.

satrik commented 3 years ago

@d8ahazard WLED uses Arduino libraries, this has nothing to do with the ESP-IDF

d8ahazard commented 3 years ago

@d8ahazard WLED uses Arduino libraries, this has nothing to do with the ESP-IDF

Scroll up, look at the issue mentioned by @pbolduc. It's specifically an issue with the MDNS library used by...arduino/esp32.

Maybe I'm completely wrong, but I have a very strong feeling that these are the same issue.

satrik commented 3 years ago

The mentioned issue is for ESP32 and for the ESPmDNS library. So this maybe cause issues for ESP32 users, but definitely not for you, because Wemos D1 Mini = ESP8266 = ESP8266mDNS library

You could try the my fork and see if it also works for you

satrik commented 2 years ago

Short update, I don't have/had any MDNS issues since I use my fork :)

blazoncek commented 2 years ago

Short update, I don't have/had any MDNS issues since I use my fork :)

Master or dev?

satrik commented 2 years ago

Master and I've only changed this and this

blazoncek commented 2 years ago

Yes, seen the change. Will check it this afternoon (even though I did not have issues), but the change makes sense and avoids conditional compile. If all goes well, make a PR (make a separate branch as explained in Wiki, squashing commits).

blazoncek commented 2 years ago

@satrik tested your code. Seems to be working fine without side effects. Please make PR if you wish.

blazoncek commented 2 years ago

The possible fix @satrik suggested has now been included in latest beta release. Please update and test if you want and reopen if the mDNS issue still persist.