Open pboguk opened 8 years ago
I added RestartSec=10 to /etc/systemd/system/etcd-aws.service and problem seems gone.
I noticed similar behavior, restarting early enough (i set it to 30 seconds) indeed seemed to (reliably) fix it. I must've spun up over 30 nodes today without a repeat of this problem.
It's more likely that some kind of dependency is still missing/not started when this service is started. I'll test some more with a dependency on the docker service later.
Hi,
Faced with next situation: After cluster finished to create - only one node runs etcd-aws service. On other two I see Failed Units: 1 etcd-aws.service
journalctl -xe May 06 12:28:46 ip-10-242-131-220.ec2.internal locksmithd[619]: [etcd.service etcd2.service] are inactive May 06 12:28:46 ip-10-242-131-220.ec2.internal locksmithd[619]: Unlocking old locks failed: [etcd.service etcd2.service] are inactive. Retrying in 5m0s.
And only if I start service by hand( under root by executing systemctl start etcd-aws) etcd-aws service(and docker container) starts to work.
To recap: Only one node start etcd-awd after CF deployment, 2 others need to perform to start etcd-aws service by hand.
Any suggestions?