esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
15.99k stars 13.33k forks source link

NTP failure when populated via DHCP #7562

Closed freddiN closed 3 years ago

freddiN commented 4 years ago

Basic Infos

Platform

Settings in IDE

Problem Description

No matter if I use ESPurna, Tasmota or any other firmware to flash my Sonoff POW R2, I could not get NTP to work.

After spending hours to at least finding a workaround and visiting all kinds of forums (most of them mention this problem), I narrowed it down to the following:

It seems that the FRITZ!Box NTP setting results in the DHCP response containing NTP information, which the sonoff cannot handle for some reason: It does not use the FRITZ!Box NTP, and it ignores any local setting!

So I came up with two workarounds:

  1. switch of "use Fritzbox as NTP server for the network"
  2. don't use DHCP

Method one instantly works: My Sonoff still gets an ip address via DHCP and uses the locally configured NTP (de.pool.ntp.org in my case) to fetch time infos. I suspect method two would work as well, but I haven't tested it yet because method one already worked.

I'm sorry to say that I have never compiled a firmware, however I am able to capture tcpdumps or any further information that might help in a possible solution for this.

d-a-v commented 4 years ago

however I am able to capture tcpdumps

It would be great to have the capture of the dhcp answer supposed to be containing the NTP server list.

I'm sorry to say that I have never compiled a firmware

It's as simple as starting the arduino IDE, installing the latest version of the core, loading an example and press Flash. It is described in the board manager procedure described in the documentation. There is an example called NTP-TZ-DST supposed to show how to use NTP with this core. This example can also print the list of configured NTP servers.

freddiN commented 4 years ago

The one WITHOUT NTP from the router:

Sending packet:
  op=BOOTREQUEST chaddr=A4:68:82:7C:92:E2 hops=0 xid=A5E0D7FD secs=0 flags=8000
  ciaddr=0.0.0.0 yiaddr=0.0.0.0 siaddr=0.0.0.0 giaddr=0.0.0.0 sname= file=
  1 options:
     53 (DHCP Message Type): discover
Received packet from 192.168.178.1:67:
  op=BOOTREPLY chaddr=A4:68:82:7C:92:E2 hops=0 xid=A5E0D7FD secs=0 flags=8000
  ciaddr=0.0.0.0 yiaddr=192.168.178.30 siaddr=192.168.178.1 giaddr=0.0.0.0 sname= file=
  11 options:
     53 (DHCP Message Type): offer
     54 (Server Identifier): 192.168.178.1
     51 (IP Address Lease Time): 864000 (1 week and 3 days)
     58 (Renewal (T1) Time Value): 432000 (5 days)
     59 (Rebinding (T2) Time Value): 756000 (1 week, 1 day, and 18 hours)
      1 (Subnet Mask): 255.255.255.0
      3 (Router Option): 192.168.178.1
      6 (Domain Name Server Option): 192.168.178.1
     15 (Domain Name): fritz.box
     28 (Broadcast Address Option): 192.168.178.255
    158 (Unknown): 04 C0 A8 B2 01

The one after activating that feature again:

Sending packet:
  op=BOOTREQUEST chaddr=03:7A:6C:EA:FB:C5 hops=0 xid=4722ED1D secs=0 flags=8000
  ciaddr=0.0.0.0 yiaddr=0.0.0.0 siaddr=0.0.0.0 giaddr=0.0.0.0 sname= file=
  1 options:
     53 (DHCP Message Type): discover
Received packet from 192.168.178.1:67:
  op=BOOTREPLY chaddr=03:7A:6C:EA:FB:C5 hops=0 xid=4722ED1D secs=0 flags=8000
  ciaddr=0.0.0.0 yiaddr=192.168.178.31 siaddr=192.168.178.1 giaddr=0.0.0.0 sname= file=
  12 options:
     53 (DHCP Message Type): offer
     54 (Server Identifier): 192.168.178.1
     51 (IP Address Lease Time): 864000 (1 week and 3 days)
     58 (Renewal (T1) Time Value): 432000 (5 days)
     59 (Rebinding (T2) Time Value): 756000 (1 week, 1 day, and 18 hours)
      1 (Subnet Mask): 255.255.255.0
      3 (Router Option): 192.168.178.1
      6 (Domain Name Server Option): 192.168.178.1
     15 (Domain Name): fritz.box
     28 (Broadcast Address Option): 192.168.178.255
     42 (Network Time Protocol Servers Option): 192.168.178.1
    158 (Unknown): 04 C0 A8 B2 01

I started up wireshark to see if I can see any special info in the DHCP response there:

Internet Protocol Version 4, Src: 192.168.178.1, Dst: 255.255.255.255
User Datagram Protocol, Src Port: 67, Dst Port: 68
Dynamic Host Configuration Protocol (Offer)
    Message type: Boot Reply (2)
    Hardware type: Ethernet (0x01)
    Hardware address length: 6
    Hops: 0
    Transaction ID: 0x3ead8799
    Seconds elapsed: 0
    Bootp flags: 0x8000, Broadcast flag (Broadcast)
        1... .... .... .... = Broadcast flag: Broadcast
        .000 0000 0000 0000 = Reserved flags: 0x0000
    Client IP address: 0.0.0.0
    Your (client) IP address: 192.168.178.32
    Next server IP address: 192.168.178.1
    Relay agent IP address: 0.0.0.0
    Client MAC address: 26:ca:38:ab:f1:d6 (26:ca:38:ab:f1:d6)
    Client hardware address padding: 00000000000000000000
    Server host name not given
    Boot file name not given
    Magic cookie: DHCP
    Option: (53) DHCP Message Type (Offer)
        Length: 1
        DHCP: Offer (2)
    Option: (54) DHCP Server Identifier (192.168.178.1)
        Length: 4
        DHCP Server Identifier: 192.168.178.1
    Option: (51) IP Address Lease Time
        Length: 4
        IP Address Lease Time: (864000s) 10 days
    Option: (58) Renewal Time Value
        Length: 4
        Renewal Time Value: (432000s) 5 days
    Option: (59) Rebinding Time Value
        Length: 4
        Rebinding Time Value: (756000s) 8 days, 18 hours
    Option: (1) Subnet Mask (255.255.255.0)
        Length: 4
        Subnet Mask: 255.255.255.0
    Option: (3) Router
        Length: 4
        Router: 192.168.178.1
    Option: (6) Domain Name Server
        Length: 4
        Domain Name Server: 192.168.178.1
    Option: (15) Domain Name
        Length: 9
        Domain Name: fritz.box
    Option: (28) Broadcast Address (192.168.178.255)
        Length: 4
        Broadcast Address: 192.168.178.255
    Option: (42) Network Time Protocol Servers
        Length: 4
        Network Time Protocol Server: 192.168.178.1
    Option: (158) PCP Server
        Length: 5
        PCP server list
            List-Length: 4
    Option: (255) End
        Option End: 255
    Padding: 000000000000000000000000000000000000000000000000…

Option 42 seems to include just the NTP server, nothing fancy as far as I can see.

d-a-v commented 4 years ago

It seems that the informations are given but it does work in your case.

    Option: (42) Network Time Protocol Servers
        Length: 4
        Network Time Protocol Server: 192.168.178.1

We need to see the binary content of the dhcp reply (udp port 67 or 68) and the subsequent ntp requests (udp port 123).

If you are able to build a network example, you can try with latest master or beta version, and add this to the test sketch:

#include <Netdump.h>
NetCapture::Netdump nd;

in setup():

  Serial.begin(115200);
  nd.printDump(Serial, NetCapture::Packet::PacketDetail::FULL, [](NetCapture::Packet p) { return p.isUDP() && (p.hasPort(67) || p.hasPort(68) || p.hasPort(123)); });
Jason2866 commented 4 years ago

This problem with the Fritzbox arrives now and then in Tasmota Discord. The strange thing for many Fritzbox users it works for other it does not. So it seems the Fritzbox is doing something different. It is not a general problem! We have tested working setups with Ubiquity, mikrotik, OpenWrt, bintec-elmeg, LANCOM, CISCO, TP-Link, Google

5chufti commented 4 years ago

might be a problem of timing? In tasmota they define and apply 3 sntp servers, see https://github.com/arendst/Tasmota/blob/development/tasmota/my_user_config.h#L227 https://github.com/arendst/Tasmota/blob/development/tasmota/support_rtc.ino#L488 so maybe the sntp server sent via dhcpd does not (allways) "activate"?

Jason2866 commented 4 years ago

@5chufti no, if DHCP is activated none of the in Tasmota configured NTP servers is used. DHCP overrides. Since this issue only affects Fritzbox DHCP NTP it is probably not a timing issue. As i wrote before we have only Fritzbox users stepping in Tasmota Discord with NTP problems All other routers work fine with DHCP providing NTP to Tasmota. Imho a ticket for the Fritzbox should be opened at AVM As @d-a-v wrote the info is needed:

We need to see the binary content of the dhcp reply (udp port 67 or 68) and the subsequent ntp requests (udp port 123)

5chufti commented 4 years ago

ok, couldn't find this override in the documentation.

ascillato commented 4 years ago

@freddiN

Hi, your fritzbox is telling to the NTP arduino core that your NTP server is 192.168.178.1 that is your fritzbox. So, if the NTP server is not responding, the NTP can't be synced.

Have you tried to set a fixed IP and set the NTP server as 192.168.178.1?

Seems that fritzbox is not having its internal NTP server working. May be it is needed to set another config in your NTP of your Fritzbox?

For example, in Mikrotik, by default, they don't have a NTP server (it is a module that you need to download and install). So, by default my router don't have a NTP server, but I can enable in my router's DHCP Config to inform any server I want as NTP server source. (I have my own NTP server with a GPS chip on a nodemcu with Tasmota and works very reliable)

Jason2866 commented 4 years ago

@freddiN According to this https://avm.de/service/fritzbox/fritzbox-7590/wissensdatenbank/publication/show/336_Zeitsynchronisation-NTP-fur-FRITZ-Box-und-Netzwerkgerate-einrichten/ image

NTP Server needs to be activated.

freddiN commented 4 years ago

@Jason2866 That is what I translated to "use FRITZ!Box as NTP server for the network" in my original posting at the top. I do have this activated. And I can also see it in the DHCP response (see above, the option 42), so I would assume that the response is correct.

Also, I know that the Fritz! ntp server is working, because when I do NOT use DHCP and set up fixed ip addresses on my tasmota, the ntp sync works. It might use the default servers though ... one more scenario to look into. My raspberry pi definitely syncs with the fritz ntp server.

I am running tests every evening, setting and unsetting (=0.0.0.0) IPs, flashing and resetting tasmota, wiresharking the traffic for different scenarios, activating/deactivating the ntp server in the fritzbox, setting up Arduino with ESP core installed and so on. I agree that the best way to go forward is me/other fritzbox-users being able to provide the binary logs as explained by @d-a-v above, however that is also quite difficult for people who are unfamiliar with this.

If AVM turns out to be the problem I am more than happy to open a ticket with them.

I am still trying to reliably reproduce this and getting the binary logs.

Jason2866 commented 4 years ago

If you setup fixed ip addresses in Tasmota the NTP settings from Tasmota are used. The Fritzbox NTP is in this case in noy way involved! You can check your Friztbox timeserver if you switch of DHCP and provide in Tasmota as NTP time server the IP address of your fritzbox. If the correct time appears it is a weird bug. If not the announce from the Fritzbox does not work

freddiN commented 4 years ago

Another series of of tests, first I set it up like this:

fritz ntp server active, reset tasmota config, logged in via access point, configured wifi -> reboot, got dhcp IP deactivated DHCP in fritzbox set different static ip address in tasmota console (just to ensure I do not get something from dhcp), dns, gateway, and subnetmask set ntpserver 1, 2 and 3 to fritzbox:

00:01:03 CMD: ntpserver
00:01:03 RSL: stat/tasmota_5081B7/RESULT = {"NtpServer1":"192.168.178.1","NtpServer2":"192.168.178.1","NtpServer3":"192.168.178.1"}

rebooted the ESP and logged into tasmota console via static ip address. so far so good. And now it gets weird: I get an ntp sync around 50% of the time after reboots.

I just kept rebooting without changing anything else, sometimes I got no sync:

00:00:00 CFG: aus Flash geladen am FB, zählen 25
00:00:00 Projekt tasmota Tasmota Version 8.4.0(tasmota)-2_7_2_1
00:00:00 WIF: verbinden mit AP1 vader2 Channel 11 BSSId 2C:91:AB:F5:9D:64 in Modus 11N als tasmota_5081B7-0439...
00:00:02 WIF: verbunden
00:00:03 HTP: Web-Server aktiv bei tasmota_5081B7-0439 mit IP-Adresse 192.168.178.150
00:00:04 RSL: tele/tasmota_5081B7/INFO1 = {"Module":"Sonoff Basic","Version":"8.4.0(tasmota)","FallbackTopic":"cmnd/DVES_5081B7_fb/","GroupTopic":"cmnd/tasmotas/"}
00:00:04 RSL: tele/tasmota_5081B7/INFO2 = {"WebServerMode":"Admin","Hostname":"tasmota_5081B7-0439","IPAddress":"192.168.178.150"}
00:00:04 RSL: tele/tasmota_5081B7/INFO3 = {"RestartReason":"Software/System restart"}
00:00:04 RSL: stat/tasmota_5081B7/RESULT = {"POWER":"OFF"}
00:00:04 RSL: stat/tasmota_5081B7/POWER = OFF
00:00:08 RSL: tele/tasmota_5081B7/STATE = {"Time":"1970-01-01T00:00:08","Uptime":"0T00:00:10","UptimeSec":10,"Heap":28,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":0,"POWER":"OFF","Wifi":{"AP":1,"SSId":"vader2","BSSId":"2C:91:AB:F5:9D:64","Channel":6,"RSSI":54,"Signal":-73,"LinkCount":1,"Downtime":"0T00:00:04"}}

and sometimes I did:

00:00:00 CFG: aus Flash geladen am F6, zählen 30
00:00:00 Projekt tasmota Tasmota Version 8.4.0(tasmota)-2_7_2_1
00:00:00 WIF: verbinden mit AP1 vader2 Channel 6 BSSId 2C:91:AB:F5:9D:64 in Modus 11N als tasmota_5081B7-0439...
00:00:01 WIF: verbunden
00:00:01 HTP: Web-Server aktiv bei tasmota_5081B7-0439 mit IP-Adresse 192.168.178.150
00:08:05 RSL: tele/tasmota_5081B7/INFO1 = {"Module":"Sonoff Basic","Version":"8.4.0(tasmota)","FallbackTopic":"cmnd/DVES_5081B7_fb/","GroupTopic":"cmnd/tasmotas/"}
00:08:05 RSL: tele/tasmota_5081B7/INFO2 = {"WebServerMode":"Admin","Hostname":"tasmota_5081B7-0439","IPAddress":"192.168.178.150"}
00:08:05 RSL: tele/tasmota_5081B7/INFO3 = {"RestartReason":"Software/System restart"}
00:08:05 RSL: stat/tasmota_5081B7/RESULT = {"POWER":"OFF"}
00:08:05 RSL: stat/tasmota_5081B7/POWER = OFF
00:08:10 RSL: tele/tasmota_5081B7/STATE = {"Time":"2020-09-03T00:08:10","Uptime":"0T00:00:09","UptimeSec":9,"Heap":27,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":0,"POWER":"OFF","Wifi":{"AP":1,"SSId":"vader2","BSSId":"2C:91:AB:F5:9D:64","Channel":6,"RSSI":56,"Signal":-72,"LinkCount":1,"Downtime":"0T00:00:03"}}

I started packet sniffing until l was able to create one wireshark log where after the first reboot I got synced and after the second reboot I did not. ntp request 1: image ntp response 1: image

ntp request 2 (16 seconds after the first one): image ntp response 2: image

If you put them side by side: the requests are completely identical. the responses differ slightly in the timestamps (since they are 16 seconds apart) and root dispersion.

To make this clear: I did not change anything, I just kep rebooting: Sometimes I got a sync, sometimes I did not.

So, on with the next step: set all ntpservers to ntppool and deactivate the ntp server in the fritzbox:

00:34:14 CMD: ntpserver
00:34:14 RSL: stat/tasmota_5081B7/RESULT = {"NtpServer1":"de.pool.ntp.org","NtpServer2":"de.pool.ntp.org","NtpServer3":"de.pool.ntp.org"}

I started the sniffer and started playing the rebooting game again, and it looked the same: Sometimes I would get a sync, sometimes I did not.

00:00:00 CFG: aus Flash geladen am FB, zählen 41
00:00:00 Projekt tasmota Tasmota Version 8.4.0(tasmota)-2_7_2_1
00:00:00 WIF: verbinden mit AP1 vader2 Channel 1 BSSId 2C:91:AB:F5:9D:64 in Modus 11N als tasmota_5081B7-0439...
00:00:01 WIF: verbunden
00:00:01 HTP: Web-Server aktiv bei tasmota_5081B7-0439 mit IP-Adresse 192.168.178.150
00:00:02 RSL: tele/tasmota_5081B7/INFO1 = {"Module":"Sonoff Basic","Version":"8.4.0(tasmota)","FallbackTopic":"cmnd/DVES_5081B7_fb/","GroupTopic":"cmnd/tasmotas/"}
00:00:02 RSL: tele/tasmota_5081B7/INFO2 = {"WebServerMode":"Admin","Hostname":"tasmota_5081B7-0439","IPAddress":"192.168.178.150"}
00:00:02 RSL: tele/tasmota_5081B7/INFO3 = {"RestartReason":"Software/System restart"}
00:00:02 RSL: stat/tasmota_5081B7/RESULT = {"POWER":"OFF"}
00:00:02 RSL: stat/tasmota_5081B7/POWER = OFF
00:00:07 RSL: tele/tasmota_5081B7/STATE = {"Time":"1970-01-01T00:00:07","Uptime":"0T00:00:09","UptimeSec":9,"Heap":27,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":0,"POWER":"OFF","Wifi":{"AP":1,"SSId":"vader2","BSSId":"2C:91:AB:F5:9D:64","Channel":1,"RSSI":46,"Signal":-77,"LinkCount":1,"Downtime":"0T00:00:03"}}

(reboot)

00:00:00 CFG: aus Flash geladen am FA, zählen 42
00:00:00 Projekt tasmota Tasmota Version 8.4.0(tasmota)-2_7_2_1
00:00:00 WIF: verbinden mit AP1 vader2 Channel 1 BSSId 2C:91:AB:F5:9D:64 in Modus 11N als tasmota_5081B7-0439...
00:00:01 WIF: verbunden
00:00:01 HTP: Web-Server aktiv bei tasmota_5081B7-0439 mit IP-Adresse 192.168.178.150
00:37:03 RSL: tele/tasmota_5081B7/INFO1 = {"Module":"Sonoff Basic","Version":"8.4.0(tasmota)","FallbackTopic":"cmnd/DVES_5081B7_fb/","GroupTopic":"cmnd/tasmotas/"}
00:37:03 RSL: tele/tasmota_5081B7/INFO2 = {"WebServerMode":"Admin","Hostname":"tasmota_5081B7-0439","IPAddress":"192.168.178.150"}
00:37:03 RSL: tele/tasmota_5081B7/INFO3 = {"RestartReason":"Software/System restart"}
00:37:03 RSL: stat/tasmota_5081B7/RESULT = {"POWER":"OFF"}
00:37:03 RSL: stat/tasmota_5081B7/POWER = OFF
00:37:08 RSL: tele/tasmota_5081B7/STATE = {"Time":"2020-09-03T00:37:08","Uptime":"0T00:00:09","UptimeSec":9,"Heap":27,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":0,"POWER":"OFF","Wifi":{"AP":1,"SSId":"vader2","BSSId":"2C:91:AB:F5:9D:64","Channel":1,"RSSI":58,"Signal":-71,"LinkCount":1,"Downtime":"0T00:00:03"}}

The sniffer showed that the ntp communication was with a different host: image But still, even after using pool.ntp.org, I sometimes do not get a sync.

If this is not a timing issue/race condition shortly after the wifi connect, I dont know what it is ...

Jason2866 commented 4 years ago

You have a network issue. Try with a Smartphone as Accesspoint for the Tasmota device(s) I am sure the issue will disappear.

freddiN commented 4 years ago

Ok, here we go. Phone within the fritzbox wifi, I set up wifi tethering and reset tasmota, so now it looks like this: ESP --> phone --> fritzbox --> internet

my phone set up a 192.168.43.x network for the tethering, I scanned the network from my phone and found the tasmota at 192.168.43.42. I was able to connect there from my phone and started rebooting and watching the ntp syncs. The first 3 reboots all had a working ntp sync, then this happened: Screenshot_2020-09-03-21-32-24-780_net onecook browser No sync. The Fritz ntp server is deactivated and I have reset Tasmota, so it probably goes for one of the defaults from https://github.com/arendst/Tasmota/blob/development/tasmota/my_user_config.h#L227 Damnit.

So, last remaining test: Taking my phone out of my fritzbox wifi, connecting to LTE and having the ESP connected via wifi tethering, so its ESP --> phone --> internet now. Then I started rebooting. I am at my 25th reboot now and did not have a single failed ntp sync: As soon as the fritzbox is out, it works.

So it all comes back to this: I need to reproduce it again from within the fritz wifi network and have

Serial.begin(115200);
nd.printDump(Serial, NetCapture::Packet::PacketDetail::FULL, [](NetCapture::Packet p) { return p.isUDP() && (p.hasPort(67) || p.hasPort(68) || p.hasPort(123)); });

working for a successful and unsuccessful ntp sync case.

I'll just hope you will be able to do sonmething useful with those logs, because I am THAT close to set up my raspi to trigger tasmota-reboots until it syncs ...

devyte commented 4 years ago

Don't give up, it seems like you're on the right path. You're the only one who can reproduce the problem, so only you can investigate and provide relevant info.

Jason2866 commented 4 years ago

It seems to be the Fritzbox. The logs will help the AVM people Maybe you try this too https://www.ip-phone-forum.de/threads/fritzbox-7390-zeitserver-antwortet-nicht-trotz-erreichbarkeit.303126/ Try a different NTP server as the preconfigured 0.europe.pool.ntp.org in my tests ping took a long time. For example ntp1.t-online.de is fast and reliable pool.ntp.org seems fast too...

dirkmueller commented 4 years ago

lwip2's sntp implementation has this comment:

 * You need to increase MEMP_NUM_SYS_TIMEOUT by one if you use SNTP!

I don't see the esp8266 core doing that anywhere. what am I missing?

I agree that there must be a race somewhere, either with the timers or with something else. the ntp sync is not always reliable.

d-a-v commented 4 years ago

MEMP if for MEMORY POOL, when an implementation choses to not use malloc(), that we use. So it is likely that we are not concerned. I however can't find any reference to this, where is that comment ?

dirkmueller commented 4 years ago

I however can't find any reference to this, where is that comment ?

it is in master of lwip (yes I know thats newer than what esp8266 embeds, but the implementation didnd't change between 2.1.2 and master):

http://git.savannah.nongnu.org/cgit/lwip.git/commit/?id=4b10b4dab3e8f66ee730d6d48e9f494c45dbd967