Define a new CRD to submit job to session cluster

functicons commented 5 years ago

Currently after a session cluster is up and running, we have to submit jobs to the cluster through Flink's API endpoint, which usually means the user has to install Flink CLI on their local machine.

We could consider adding a new CRD and controller for job submission, after a job CR is created the controller gets notified and submit the job to the cluster automatically, it also polls the job status and update the CR. We might also migrate the current job submission to job cluster to the approach as well.

elanv commented 4 years ago

I think it's good to separate FlinkCluster's spec.job into separate CRDs. If you create separate CRDs such as FlinkJob and BeamJob, and add controllers for each, it seems to be a much more extensible structure.

In particular, Beam support - #64, #174 - will increase complexity if the current FlinkCluster CRD & controller supports it. I hope this issue is addressed to keep the operator simple and extensible.

functicons commented 4 years ago

Previously, I thought about job CRD even for job clusters, but that means users have to deal with 2 CRDs, but I want to keep things simple, that's why I embedded jobSpec in the FlinkCluster CRD.

elanv commented 4 years ago

I agree to keep it simple. In the current implementation, handling flink job with the FlinkCluster spec seems simple. However, as time goes by, I'm a little concerned that the current structure will be more complex as the community wants more functionality like Beam support. The seamless job update I suggested would have been less expensive to add functionality if there was a separate FlinkJob. I don't think it's urgent to seperate FlinkJob from the FlinkCluster now.

However, I don't think two CRDs will be burdensome for users. Just like the k8s apps/deployment spec includes a pod template. If the FlinkJob spec includes the FlinkCluster spec, the user who wants the job cluster will only need to handle the FlinkJob CR when creating the job cluster. In this case, you will not have to link FlinkJob to FlinkCluster by creating FlinkJob and FlinkCluster respectively. For example:

apiVersion: flinkoperator.k8s.io/v1beta1
kind: FlinkJob
metadata:
  name: flink-job
spec:
  jarFile: ./examples/streaming/WordCount.jar
  className: org.apache.flink.streaming.examples.wordcount.WordCount
  flinkClusterTemplate:
    image:
      name: flink:1.8.1
    jobManager:
    taskManager:
      replicas: 2
    flinkProperties:

functicons commented 4 years ago

With job CRD, how do you model Beam job?

elanv commented 4 years ago

When there was a branch in the reconcile logic between the Flink job and the Beam job, I was concerned about the complexity of supporting one CRD. If a beam job is supported via FlinkRunner without a job server, it seem there is no need to separate the BeamJob CRD at this time. In the future, it may be necessary to separate FlinkJob CRDs from FlinkCluster if they become feature rich, but I don't think it is urgent as mentioned in the previous comments.

GoogleCloudPlatform / flink-on-k8s-operator

Define a new CRD to submit job to session cluster #49