crewjam / etcd-aws

tools for building a robust etcd cluster in AWS
BSD 2-Clause "Simplified" License
123 stars 45 forks source link

Only one node starts etcd-aws #7

Open pboguk opened 8 years ago

pboguk commented 8 years ago

Hi,

Faced with next situation: After cluster finished to create - only one node runs etcd-aws service. On other two I see Failed Units: 1 etcd-aws.service

journalctl -xe May 06 12:28:46 ip-10-242-131-220.ec2.internal locksmithd[619]: [etcd.service etcd2.service] are inactive May 06 12:28:46 ip-10-242-131-220.ec2.internal locksmithd[619]: Unlocking old locks failed: [etcd.service etcd2.service] are inactive. Retrying in 5m0s.

And only if I start service by hand( under root by executing systemctl start etcd-aws) etcd-aws service(and docker container) starts to work.

To recap: Only one node start etcd-awd after CF deployment, 2 others need to perform to start etcd-aws service by hand.

Any suggestions?

pboguk commented 8 years ago

I added RestartSec=10 to /etc/systemd/system/etcd-aws.service and problem seems gone.

pieterlange commented 8 years ago

I noticed similar behavior, restarting early enough (i set it to 30 seconds) indeed seemed to (reliably) fix it. I must've spun up over 30 nodes today without a repeat of this problem.

It's more likely that some kind of dependency is still missing/not started when this service is started. I'll test some more with a dependency on the docker service later.