apache / incubator-streampark

Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
https://streampark.apache.org/
Apache License 2.0
3.8k stars 978 forks source link

[Feature] flink cluster new status tracking #2425

Open xujiangfeng001 opened 1 year ago

xujiangfeng001 commented 1 year ago

Search before asking

Description

After discussion with the developers, it was decided that the new status function points for Flink cluster are as follows:

  1. Heartbeat detection: a. the heartbeat detection capability of the cluster is added. Different modes (yarn-session, remote, k8s-session) have different url acquisition methods, REST polling, and status acquisition.
  2. Status update: a. the cluster adds a status field to record the cluster status (running, shutdown, lost).
  3. Failure alarm&failover: a. If cluster shutdown or lost is detected, an alarm will be sent. b. If the job is running on the cluster, the job will alarm in batches. At this time, it is necessary to prevent the job from alarming. c. The cluster does not fail over itself. d. The running job on the cluster will trigger the failover mechanism. At this time, you need to interrupt the failover mechanism of the job and set the job status to lost.
  4. The operation logic changes: a. Cluster deletion: the cluster is not bound to any job, and the cluster is stopped. b. Cluster stop: no jobs are bound on the cluster, or all the bound jobs are not running. c. Job start: if the job is in remote, k8s-session, yarn-session mode, you need to check whether the cluster bound to it is running, and then you can start it. d. Job addition: if it is remote, k8s-session, yarn-session mode, the cluster selection drop-down box needs to filter the cluster that is not started, leaving only the cluster that is running. e. Job modification: If the original mode is remote, k8s-session, yarn-session, and the bound cluster is not running, you cannot save it. You can only select another mode or switch to a running cluster.

The following is task splitting:

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

RocMarshal commented 1 year ago

Good job~ 👍