Change directory structure to a directory per container manager

iemejia commented 7 years ago

I hadn't seen you have created a new repo for the examples. Nice. Here is my first contribution, feel free to accept or not.

iemejia commented 7 years ago

R: @patricklucas take a look.

iemejia commented 7 years ago

Patrick if you agree we can work now on making flink also an official chart of kubernetes :) https://github.com/kubernetes/charts I mean you have done almost everything so it is totally your credit, but don't hesitate to tell me if I can help with something.

Vince-Cercury commented 6 years ago

any update on "making flink also an official chart of kubernetes"?

Is there someone maintaining this? I've been using Flink with our own kubernetes resouces files, based on the documentation. I would prefer using a HELM chart. Trying this chart to set it up HA. No luck yet

iemejia commented 6 years ago

Excellent question @VinceMD. I would really love to work on this now. @patricklucas any chance we can do this with your code?

Vince-Cercury commented 6 years ago

@iemejia I now understand better the HA setup. I could not make it work with this Helm (see https://github.com/docker-flink/examples/issues/4) but I think I got it working with my own manifest files and docker images.

Also consider that document https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077 which outlines future changes to Flink to make it leverage the power containers and provisioning job managers and task manager just and only when needed.

With current versions, there are two main ways to set it up HA with Kubernetes and Zookeeper:

Two or more active job managers and a zookeeper cluster. This replicates what you would do in a more traditional infrastructure. As shown in the diagram https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/jobmanager_high_availability.html If the first jobmanager is not responsive, a new leader is elected. The Flink client and tasks managers are communicating to Zookeeper to find out the details of the new leader (they don't use Job manager RPC address, they first go through Zookeeper, which tells them what is the RPC address of the Flink job manager leader. You need two or more Kubernetes services. One for each job manager. We don't want to use a single service with load balancing. That won't work with Fink.
A single job manager and a zookeeper cluster. The job manager is stateless, so if Kubernetes is able to detect quickly that the Job manager is not healthy (LivelinessProb is critical here), it will kill it and start a new one. The new one will connect to Zookeeper, complain that there is no Leader for a while and that even though Job manager is the same hostname, the leader ID stored in Zookeeper is not the same as the new ID just generated. But it will eventually become the new leader and try to recover jobs.

Benefits of option 1: you don't need to wait for the container to be restarted. It should quickly pivot to the other job manager Cons of option 1: two job managers consume more resources. One of them is doing nothing. You have to setup two Kubernetes services, one for each job managers. You need to inject the full hostname of the job managers in conf/masters file in each container so that the UI can redirect you to the right URL of the leader job manager. So basically we can't use a single service and 2 pods behind the service. It has to be 2 services and 2 pods. Zookeeper sends you to one or the other. There is not actual load balancer. The clients as well as the tasks manager need to be talking to the right leader, via the right exposed service.

Benefit of option 2: cheaper to run, simpler to configure (only 1 k8s service). Cons of option 2: without a well defined Liveliness Prob for the job manager (I don't know yet if there is a health check service available for job manager), it may remain in an unhealthy state. So it will never be restarted by Kubernetes. Another issue, even if job manager is indeed restarted, it may take time to pull the container image or something can happen during the boot process. It could further delay the job recovery or might never happen if the pod fail to boot for any reason. Having active/active with leader election in option 1, somewhat mitigates that risk.

I think, after writing this, I'm inclined to use option 2 for production. Although we could have a helm chart that offers both options. We can set up option 1 by creating first job manager deployment and first service (jobmanager-deployment-1 and jobmanager-svc-1). If option 2 is desired, just need to apply jobmanager-deployment-2 and jobmanager-svc-2 which will automatically register itself to the zookeeper cluster and become the active backup.

iemejia commented 6 years ago

@VinceMD will you be up to contribute actively to work on a helm chart ? We can then ask the community and try to make it an official helm chart as we did with the docker image. We can setup a repo for this here and start working on this based on @patricklucas's implementation.

Vince-Cercury commented 6 years ago

Yes I would need some help with https://github.com/docker-flink/examples/issues/4 I'm planning on using a HELM chart for FLink and Zookeeper soon. I'm not expert with Flink or HELM but would like to see a community supported project.

We need to be mindful of FLIP-6 so to not do too much work for nothing if the design changes considerably.

Vince-Cercury commented 6 years ago

After further testing I believe a single Job manager is sufficient in HA mode. I feel more confident it works with the LivenessProb https://github.com/docker-flink/examples/pull/8

patricklucas commented 6 years ago

Hi folks, thanks for the continued discussion about this. The last few months have been pretty crazy and I let this work fall by the wayside.

Regarding HA-mode: the general consensus has been that the best approach to Flink HA-mode when using a resource manager is to use a single jobmanager. Granted, as you've noticed, this leaves a few things unanswered, such as the best way to do a liveness probe, but I think what you have in #8 is a decent start.

As far as this PR, @iemejia, if you fix up the conflicts and address the one comment I made (just replace that file you moved with a link to the new location) I think we can merge this and move on.

Vince-Cercury commented 6 years ago

@patricklucas agree. After implementing the liveness prob, I also decided to use a single Jobmanager.

iemejia commented 6 years ago

Sure, I am pretty busy at this moment but I will try to work on this next week and ping you guys

docker-flink / examples

Change directory structure to a directory per container manager #1