d2iq-archive / kubernetes-mesos

A Kubernetes Framework for Apache Mesos
636 stars 92 forks source link

HA for k8sm-controller-manager #457

Open ravilr opened 9 years ago

ravilr commented 9 years ago

@jdef Is there a recommendation for running redundant k8sm controller manager's in HA setup similar to scheduler: https://github.com/kubernetes/kubernetes/blob/master/contrib/mesos/docs/ha.md

jdef commented 9 years ago

not yet - that type of work is currently on hold.

going forward we're going to be refactoring the scheduler HA to communicate w/ the apiserver instead of w/ etcd directly. see https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/high-availability.md

ravilr commented 8 years ago

leader election integration with controller-manager component has landed in upstream: https://github.com/kubernetes/kubernetes/pull/19621 Would love to see k8sm-scheduler also updated to use the same leaderelection client recipe.

jdef commented 8 years ago

see use case: https://github.com/mesosphere/kubernetes-mesos/issues/493#issuecomment-179524159

salmanbukhari commented 8 years ago

Is there any update on this issue?

jdef commented 8 years ago

not from the mesosphere team. perhaps someone in the community has started hacking on this?

salmanbukhari commented 8 years ago

Oh okay I was looking at the document regarding Mesos HA cold stand by mode. There was a strong dependency on the Nginx in scheduler according to the document. What happens if NGINX goes down? Can you answer or refer me to right person who can answer?

jdef commented 8 years ago

my read on this is that the nginx instructions are enough to demonstrate a PoC for cold-standby mode. You could probably replace nginx with your choice of an HA load balancer (though, as the docs say there are some additional protocol requirements of some kubectl commands).

/cc @huang195

salmanbukhari commented 8 years ago

Actually, I was concerned because of these statements in the doc : "It is critically important to point --advertised-address to Nginx so all the schedulers would be assigned the same executor ID... they would generate different executor IDs(in the case of different Ip)" . So the scheduler needs an IP address of any load balancer that stays up and persists IP even in the case of failure.

jdef commented 8 years ago

Right, so basically you don't want any parameters/environment variables sent from the k8sm scheduler to the k8sm executor process to change. Network addresses are part of that. If using a resolvable DNS name solves the problem for you (because it will always resolve to some LB to reach the API server) then that should work just fine. But now you've added a DNS dependency. If you're fine with that, great. Does this make sense?

salmanbukhari commented 8 years ago

Yes, thank you for the explanation. But it will make the system more complex for me , as in my case I am doing automation of kubernetes on mesos with high availability on AWS and NGINX will be running on same nodes as masters. I will try with elastic ip but if it didn't worked then I have to do it with DNS. Is this same for kubernetes HA without mesos? As there is no such point regarding advertised-address mentioned on their website.

jdef commented 8 years ago

I'd have to review the k8s HA docs. in stock k8s, kubelet and kube-proxy (which run on all the slave/agent/node/whatever hosts) need to be able to find the API server somehow. whether that's via IP or DNS name. k8s bootstrapping has been an ongoing issue for a bit now. i'm not sure how far they've gotten for HA setups. you could review the salt scripts to see how they do it for dev setups on GCE. but that sounds like a much different case than you're trying to solve.

k8sm certainly has some different configuration requirements (and edge cases) due to (a) the nature of running on a mesos cluster, and (b) how we approached configuration for the components. there's been some exciting dc/os developments lately that could probably help with some of the sharp edges related to service discovery and running k8sm components on dc/os. if you're interested in that kind of thing there's a group of people congregating here: https://dcos-community.slack.com/messages/kubernetes/

huang195 commented 8 years ago

@salmanbukhari you will need multiple load balancers (e.g., nginx, haproxy, etc.) in front of apiservers, so that you don't have a single point of failure. You can then use a single floating IP managed by something like Keepalived to manage the floating IP when the currently active load balancer fails.