coreos / fleet

fleet ties together systemd and etcd into a distributed init system
Apache License 2.0
2.43k stars 302 forks source link

DNS resolution fails whilst booting (even after network-online.target) #1325

Open jamime opened 9 years ago

jamime commented 9 years ago

I have several fleet units that all fail due to the DNS resolution failing after reboot. In the example below the service works correctly if started once the system is running. However, if it is loaded on a cluster size of 1 and the machine is rebooted it will not work as expected

CoreOS alpha (766.0.0) fleetd version 0.10.2


[Unit]
Description=Network Online Bug
After=network-online.target
Requires=network-online.target

[Service]
ExecStart=/usr/bin/whois example.com

Rebooted

The service will fail to resolve the DNS record

Aug 11 00:53:47 personal systemd[1]: Started Network Online Bug. Aug 11 00:53:47 personal whois[776]: getaddrinfo(whois.verisign-grs.com): Temporary failure in name resolution Aug 11 00:53:47 personal systemd[1]: Starting Network Online Bug... Aug 11 00:53:47 personal systemd[1]: bug.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Aug 11 00:53:47 personal systemd[1]: bug.service: Unit entered failed state. Aug 11 00:53:47 personal systemd[1]: bug.service: Failed with result 'exit-code'.

Started Manually

Running fleetctl stop bug && fleetctl start bug will produce the expected results

Aug 11 00:55:38 personal whois[915]: For more information on Whois status codes, please visit Aug 11 00:55:38 personal whois[915]: https://www.icann.org/resources/pages/epp-status-codes-2014-06-16-en. Aug 11 00:55:38 personal whois[915]: % IANA WHOIS server Aug 11 00:55:38 personal whois[915]: % for more information on IANA, visit http://www.iana.org Aug 11 00:55:38 personal whois[915]: % This query returned 1 object Aug 11 00:55:38 personal whois[915]: domain: EXAMPLE.COM Aug 11 00:55:38 personal whois[915]: organisation: Internet Assigned Numbers Authority Aug 11 00:55:38 personal whois[915]: created: 1992-01-01 Aug 11 00:55:38 personal whois[915]: source: IANA

wuqixuan commented 9 years ago

@jamime , I guest maybe the reason is below: If manually runs "fleetctl stop bug && fleetctl start bug", the system is initialized already. Maybe the environments and conditions are ready.
But if reboots, during the rebooting, maybe other environments and conditions are not ready, it meas network-online.target maybe is not eough. So it gets error.

So I guess it's nothing about the fleet, it's reason of system environment. Can you check and give more informations ?

fujohnwang commented 7 years ago

network-online.target can't guarantee DNS is ready at reboot, I encounter same issue with other service at reboot, my solution is to set Restart=on_failure and retry every 5 seconds to wait for its ready