Open duduita opened 1 month ago
@duduita thank you for finding and reporting this issue!
@wengzhe did you see that?
Hi @duduita , since 12.1.0, the DNS caching will become invalid after the TTL from the DNS server has expired, then we'll send a new query for the domain names if it's being looked up. I found the TTL is always less than 100s (and normally ~20s) for 0.pool.ntp.org
in my local environment. Maybe your DNS server gives you non-responsive IP addresses with a longer TTL which causes this problem.
There is a hack that may force resolving the domains: set CONFIG_NETDB_DNSCLIENT_LIFESEC
to a shorter value, e.g. 5 seconds, then after 5sec the cache will become invalid and the domain will be resolved again if you do the lookup.
During NTP server querying through the
ntpclient.c
, different NTP server domain names (e.g., 0.uk.pool.ntp.org, 1.uk.pool.ntp.org) might resolve to the same set of IP addresses due to DNS caching. This can lead to repeated queries to the same non-responsive IP addresses, resulting in failures to obtain the correct time.For example, in the following, there are some logs that I added to
ntpclient.c
, in order to understand why the NTP was failing:To mitigate this issue, a possible option is to flush the DNS cache after cycling through all configured NTP servers, ensuring that subsequent DNS resolutions provide potentially new and responsive IP addresses, thereby increasing the likelihood of successful time synchronization. However, I cannot manipulate the DNS cache from the user space, unless I create an API for it.
Overall, do you have a workaround or a hack that I can use in order to solve this NTP issue? Or at least to force a new IP resolution for an NTP hostname after some failures?