Closed ravilr closed 8 years ago
thanks so much for reporting this!
TODO: as part of this ticket, update the test plan docs to validate scheduler HA isn't broken --> #751
for our use case of running an instance of scheduler on three different vm/host's, using os.Hostname() as the etcd election key's value, has been working fine.
contrib/mesos/pkg/scheduler/service/service.go
- log.Infof("registering for election at %v with id %v", path, eid.GetValue())
- go election.Notify(election.NewEtcdMasterElector(etcdClient), path, eid.GetValue(), srv, nil)
+ hostname, err := os.Hostname()
+ if err != nil {
+ log.Fatalf("Failed to get hostname: %v", err)
+ }
+ log.Infof("registering for election at %v with id %v", path, hostname)
+ go election.Notify(election.NewEtcdMasterElector(etcdClient), path, hostname, srv, nil)
@jdef @s-urbaniak
in case of multiple k8sm scheduler instances, all of them are being registered for master election with the 'id', leading to death of spiral of all scheduler instances.
I0211 23:02:18.638476 25929 service.go:586] registering for election at /mesos/k8sm/framework/Kubernetes/leader with id 14732d8d4c8e1382_k8sm-executor
previously, each scheduler instances were getting their own uid (with same executor group) and that being used in master election: https://github.com/mesosphere/kubernetes/blob/v0.7.0-v1.1.1/contrib/mesos/pkg/scheduler/service/service.go#L562
But, this seems to have changed since below pr: https://github.com/kubernetes/kubernetes/pull/15775