esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
16.03k stars 13.34k forks source link

mDNS sudden death #7262

Closed jjsuwa closed 4 years ago

jjsuwa commented 4 years ago

MCVE: Simple WiFiSTA + mDNS, and LED blink as coalmine canary :)

define AP_SSID "YOUR-SSID"

define AP_PASSWORD "YOUR-PASSWORD"

define MDNS_HOSTNAME "WiFiSta"

define LED_PIN 4

define DIVIDER 10000

void setup() { pinMode(LED_PIN, OUTPUT); WiFi.persistent(false); WiFi.mode(WIFI_OFF); Serial.begin(115200); delay(250);

WiFi.mode(WIFI_STA); Serial.print(F("\n\n\n" "WiFi(STA): SSID=\"" AP_SSID "\".\n" "WiFi(STA): connecting")); WiFi.begin(F(AP_SSID), F(AP_PASSWORD)); for (; ; delay(500)) { if (WiFi.status() == WL_CONNECTED) { Serial.printf_P(PSTR(", done.\n" "WiFi(STA): IP address=%s/%s.\n" "mDNS: hostname=\"" MDNS_HOSTNAME ".local\".\n"), WiFi.localIP().toString().c_str(), WiFi.subnetMask().toString().c_str()); MDNS.begin(F(MDNS_HOSTNAME)); break; } Serial.print('.'); } }

void loop() { MDNS.update();

// visible canary static unsigned int counter = DIVIDER; if (--counter == 0) { counter = DIVIDER; digitalWrite(LED_PIN, digitalRead(LED_PIN) == 0); } }


**Symptom:**
- After WiFi connecting done, 1st mDNS response will almost always be fine.
- But later attempts will not often be responded without any signs, especially at some interval (a few minites~).
- **[Edited 1]** Once happened, it seems not to recover permanently.

**Additional Info:**
- Regardless of above, LED blink doesn't stop (`loop()` lives).
- Same as above, ping w/dot-decimal-notation responds (both WiFi and IP echo live).
- Debug output (eg. `CORE+WIFI+HTTP_UPDATE+UPDATER+OTA+OOM+MDNS`) tells no clue about this...
mikekgr commented 4 years ago

{ Continuing the related discussion from https://gitter.im/esp8266/Arduino } As @d-a-v suggested to me, I just finished the testing of "OTA-mDNS-SPIFFS" that is coming as an mDNS library example. I Noticed the same bad behavior , when the ESP8266 D1 mini is starting, I have correct bonjour appearance for about 8 minutes, then is lost, nothing and never more. Same as my sketch where initially I found the problem...

edit from maintainter: ref

d-a-v commented 4 years ago

@jjsuwa @mikekgr I have been running the two above tests without issues for several minutes (I'll let them run and update if something happens). It may not be an issue with mDNS but with NONOS-SDK FW.

Latest release 2.7.0 is using NONOS-SDK v2.2.1+100 (2019-07-03). You may try with "Legacy 2.2.1" which was previously shipped, or with more recent ones: 2.2.1(2019-11-22) is the latest. This list is available in arduino IDE menus when the generic board is chosen. The current default version was chosen based on user reports.

You may also add WiFi.setSleepMode(WIFI_NONE_SLEEP); (just in case / for the test / to be sure).

I am running the gitter sketch as-is, and the above one with this added code in the end of setup():

    auto hService = MDNS.addService(0, "itworks", "tcp", 58266);
    if (hService)
    {
        if ((!MDNS.addServiceTxt(hService, "readme", "0xdeep")))
        {
            MDNS.removeService(hService);
            hService = 0;
        }
    }

This code allows me to run this bash command on Linux with avahi:

edit: with cache flush

#!/bin/bash
srv=_itworks._tcp

c=0
while true; do
    echo ""
    date
    c=$((c+1))
    echo $c
    echo
    avahi-browse -t -r $srv
    sudo avahi-daemon --kill # flush mDNS cache, automatically restarted
    sleep 10
done

(replace _itworks by _arduino for the OP example)

Both are running flawlessly for 1770 seconds (gitter) and 770 seconds (OP). edit: restarted with cache flush and still running after 2180s (resp. 1700s)

jjsuwa commented 4 years ago

I think 10 second intervals are not enough to reproduce the issue. (needs 1 min+)

And, any of

cannot help to resolve the issue.

Repetitive mDNS resolve one-liner for Windows Command Prompt: for /L %a in (0,0,1) do @(ping -4 -n 1 target.local & timeout /nobreak 60)

C:\Users\Administrator>for /L %a in (0,0,1) do @(ping -4 -n 1 WiFiSta.local & timeout /nobreak 60)

Pinging WiFiSta.local [192.168.2.20] with 32 bytes of data:
Reply from 192.168.2.20: bytes=32 time=5ms TTL=255

Ping statistics for 192.168.2.20:
    Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 5ms, Maximum = 5ms, Average = 5ms

Waiting for  0 seconds, press CTRL+C to quit ...

Pinging WiFiSta.local [192.168.2.20] with 32 bytes of data:
Reply from 192.168.2.20: bytes=32 time=50ms TTL=255

Ping statistics for 192.168.2.20:
    Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 50ms, Maximum = 50ms, Average = 50ms

Waiting for  0 seconds, press CTRL+C to quit ...

Pinging WiFiSta.local [192.168.2.20] with 32 bytes of data:
Reply from 192.168.2.20: bytes=32 time=5ms TTL=255

Ping statistics for 192.168.2.20:
    Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 5ms, Maximum = 5ms, Average = 5ms

Waiting for  0 seconds, press CTRL+C to quit ...
Ping request could not find host WiFiSta.local. Please check the name and try again.

Waiting for  0 seconds, press CTRL+C to quit ...
Ping request could not find host WiFiSta.local. Please check the name and try again.

Waiting for  0 seconds, press CTRL+C to quit ...
Ping request could not find host WiFiSta.local. Please check the name and try again.

Waiting for 57 seconds, press CTRL+C to quit ...

However, dot-decimal-form ping is still working.

C:\Users\Administrator>ping 192.168.2.20

Pinging 192.168.2.20 with 32 bytes of data:
Reply from 192.168.2.20: bytes=32 time=4ms TTL=255
Reply from 192.168.2.20: bytes=32 time=1ms TTL=255
Reply from 192.168.2.20: bytes=32 time=2ms TTL=255
Reply from 192.168.2.20: bytes=32 time=3ms TTL=255

Ping statistics for 192.168.2.20:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 1ms, Maximum = 4ms, Average = 2ms
d-a-v commented 4 years ago

interval edited result edited @jjsuwa

I modifiied my scan-from-linux script:

I let it run for a while. It has run for 15 times 200secs or both of them (script is provided in case one could test/improve it) mdnscan.zip

Can you tell which is the last working core version ?

jjsuwa commented 4 years ago

@d-a-v

Can you tell which is the last working core version ?

Backing to r2.6.3 (commit 3d128e5c785cbe2096a0def394554d1d8091601d), it seems fine. Both Linux w/avahi service query and Windows w/Bonjour hostname resolve work well for me.

Advancing to #7025 (commit 7b0fa3554c24938ba83e2d1400d619532eff6448)... seems OK.

7042 (commit a8515a7d6626cd1e969980bd4f1e8183d684b994)... OK,

7216 (commit e5f4514847d3749c60b97a87bcda253c298a328a)... OK,

7217 (commit 77b82a0c27eded5b9e1eabe2ffa39daa0cd12e29)... failed!

And then, backing again to the latest... of course reprods the issue.

d-a-v commented 4 years ago

Thanks. So #7217 would be the hidden-to-me issue ?

Well. We have planned to make another mDNS update that will allow to have a single instance working for all interfaces. I guess it is time to try it now for a 2.7.1 bugfix release.

mikekgr commented 4 years ago

I can confirm that, also in my case, the proposed solution is working fine. I have back the normal mDNS functionality. Many thanks for @jjsuwa ,. @d-a-v and all wonderful people that working hard and free to have this ESP8266 Arduino Core. I continuing the testing but all seems fine.

d-a-v commented 4 years ago

@BbIKTOP we are going to merge #7266 because #7217 causes issues.

I am anyway going to try a change that would hopefully fit with everyone.

devyte commented 4 years ago

PR #7266 is merged as a temporary workaround. That means that issue #7217 is now present again. This is being kept open to track a solution that meets both cases.

BbIKTOP commented 4 years ago

@BbIKTOP we are going to merge #7266 because #7217 causes issues.

I am anyway going to try a change that would hopefully fit with everyone.

Do you already understand how is it possible? I just cannot imagine.

d-a-v commented 4 years ago

I still haven't understood why it works with me/you/some and not others. Anyway having one instance per interface is quite a nonsense on such small architecture. I am trying something with multicast over all interfaces, with a single instance.

reaper7 commented 4 years ago

with this commit https://github.com/esp8266/Arduino/commit/bf718c39afe88f9fc2721a25028ca96665b006d7 programming via OTA is again possible (devices do not disappear after a while)

BbIKTOP commented 4 years ago

I still haven't understood why it works with me/you/some and not others. Anyway having one instance per interface is quite a nonsense on such small architecture. I am trying something with multicast over all interfaces, with a single instance.

Yes, single instance is what i asked for since the very beginning. Although it’s quite strange that it can cause any problems. I’d like to understand how is it possible but cannot reproduce

jjsuwa commented 4 years ago

In my envs, when listening to m_netif->ip_addr, MDNSResponder::_callProcess() will never be called back. To IP4_ADDR_ANY, will be.

The scenario which I can assume:

  1. MDNS.begin(...) advertises to other mDNS servers.
  2. the other ones can cache the presense of the ESP8266mDNS service during a some short period.
  3. a mDNS client multicasts the request soon.
  4. ESP8266mDNS do not respond, but other mDNS servers (if exist) will do instead.
  5. as time passes, the cached ESP8266mDNS info will cease to be.
  6. now, nobody knows...
devyte commented 4 years ago

As discussed internally, this issue no longer applies because the troublesome commit was backed out. The original issue that resulted in that troublesome commit is #7213, and it has been reopened. Tracking will continue there. Closing.