Open waddles opened 7 years ago
Until we find a fix, I have made a workaround script in /usr/local/bin/fleetctl
#!/bin/bash
/usr/bin/fleetctl --driver=etcd --endpoints=http://127.0.0.1:2379 "$@"
In testing that though, I dropped the -heartbeat-interval 600 -election-timeout 6000
options from the etcd cluster and one of the peers reported a clock sync error > 1s so I resync-ed the clocks with ntp. Only 1 failed submission out of several hundred since then so clock sync may play a role in it too.
I have built a new environment with a cluster of 3 etcd servers and a build server running etcd in proxy mode:
In the cluster, starting with a clean registry, I run etcd (tried several versions including 2.3.3) using relevant options on all 3 servers:
And wait a few seconds for it to become healthy:
Then I try to submit a unit from my build server using fleetctl:
As above, it first accepted the unit then on 2nd try, it went into a loop reporting response code 500 from the cluster, yet the keys were correctly created in etcd. See https://gist.github.com/waddles/0e121d46c0499eaaef9685eef06120f0 for more info.
HOWEVER
I see quite a different output and no errors when I specify an endpoint in the cluster to send to (otherwise it is the same command):
--endpoint=http://127.0.0.1:2379
to force it to use the local etcd proxy and that also succeds reliably.