autopilotpattern / mysql

Implementation of the autopilot pattern for MySQL
Mozilla Public License 2.0
172 stars 68 forks source link

Dirty Consul state #24

Closed lguminski closed 8 years ago

lguminski commented 8 years ago

After running for long time the remaining "mysql-backup-run" checks

consul_by_hashicorp 3

caused that the Consul service started being perceived as unhealthy. That had an unexpected effect. I had Consul defined as DNS resolver. So when application tried to contact Consul by address http://consul.service.consul, Consul complained it cannot resolve the name consul.service.consul.

Traceback (most recent call last):
  File "/bin/triton-mysql.py", line 1041, in <module>
    on_start()
  File "/bin/triton-mysql.py", line 284, in on_start
    primary_result = get_primary_node()
  File "/bin/triton-mysql.py", line 957, in get_primary_node
    raise ex
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='consul.service.consul', port=8500): Max retries exceeded with url: /v1/kv/mysql-primary?token=aecf93b0-1b6a-4794-8c0a-8fc6624c80b9 (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7fb50ba990d0>: Failed to establish a new connection: [Errno -2] Name or service not known',))

The recursion was quite funny, but note that other names were also affected.

tgross commented 8 years ago

This might be an issue of which Consul API we're using to register the check. We're using consul.agent.check.register in the Python client API, which corresponds to /v1/agent/check/register. We use the agent API in Consul in ContainerPilot for service/check registration because of the anti-entropy property, but maybe that's the wrong thing to use in this case.

tgross commented 8 years ago

This may be associated with https://github.com/joyent/containerpilot/issues/162

tgross commented 8 years ago

Going to close this given we haven't had any feedback on it.