GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
658 stars 266 forks source link

Improve Flink job status tracking #349

Closed elanv closed 3 years ago

elanv commented 3 years ago

We determine the status of the Flink job via wait_for_job loop in the job submitter. After job submission, the job pod is still alive for tracking job status. When I measured resource usage of job pod, each job pod seems to use 200m CPU approximately. Considering the function, it seems that the resource usage is not small. It would be nice if it could be improved in a more efficient way.

functicons commented 3 years ago

Do you have any idea about what could be improved?

elanv commented 3 years ago

We call Flink REST API in each reconcile loop to get Flink job ID already. What do you think about tracking the Flink job status in reconcile loop via REST API, and making the job submitter pod to be finished after job submit command executed?

elanv commented 3 years ago

And please review this comment as well.

https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/issues/294#issuecomment-709815077

functicons commented 3 years ago

Thanks for your ideas! Let's discuss it in #294.