ansible-collections / ansible-consul

:satellite: Ansible role for Hashicorp Consul clusters
https://galaxy.ansible.com/ansible-community/consul/
BSD 2-Clause "Simplified" License
457 stars 317 forks source link

Retry restarting service on Windows #482

Closed nre-ableton closed 2 years ago

nre-ableton commented 2 years ago

In some cases, Windows' ability to restart a service is faster than consul's ability to shut down cleanly. In such cases, these lines can be seen in the consul log:

[INFO]  agent.client: shutting down client
[ERROR] agent: Error starting agent: error="Failed to start Consul
client: Failed to start lan serf: Failed to create memberlist: Could not
set up network transport: failed to obtain an address: Failed to start
TCP listener on "10.50.0.45" port 8301: listen tcp 10.50.0.45:8301:
bind: Only one usage of each socket address (protocol/network
address/port) is normally permitted."
[INFO]  agent: Exit code: code=1

The log timestamps also show that all three messages occur at the exact same time, indicating that there is likely a race condition here. Usually, a second retry one second later should provide enough time to beat the race condition, but for good measure, we'll retry up to three times just to be safe.