argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.44k stars 5.3k forks source link

App Health on FlinkDeployment stale on progressing #12993

Open myskaludek opened 1 year ago

myskaludek commented 1 year ago

Checklist:

Describe the bug

App Health stale on Progressing even if FlinkDeployment is managed by operator and running properly. Problem is in Custom Health Check https://github.com/argoproj/argo-cd/blob/e05298b9c6ab8610104271fa8491f019fee3c587/resource_customizations/flink.apache.org/FlinkDeployment/health.lua#L4

Apache flink kubernetes operator not reporting status.reconciliationStatus.success instead is reporting status.jobManagerDeploymentStatus: READY

To Reproduce

Just deploy FlinkDeployment with Apache Flink Kubernetes Operator v 1.2.0 and above

Expected behavior

Healt is Healty when flinkJobDeployment is running

Screenshots image

Version

argocd: v2.6.6+unknown
  BuildDate: 2023-03-17T11:36:14Z
  GitCommit: 
  GitTreeState: 
  GitTag: 2.6.6
  GoVersion: go1.20.2
  Compiler: gc
  Platform: linux/amd64
argocd-server: v2.6.3+e05298b
  BuildDate: 2023-02-27T14:40:19Z
  GitCommit: e05298b9c6ab8610104271fa8491f019fee3c587
  GitTreeState: clean
  GoVersion: go1.18.10
  Compiler: gc
  Platform: linux/amd64
  Kustomize Version: v4.5.5 2022-05-20T20:25:40Z
  Helm Version: v3.10.3+g835b733
  Kubectl Version: v0.24.2
  Jsonnet Version: v0.19.1

Logs

zezaeoh commented 1 year ago

We can work around this by defining a custom health check override using config-cm :)

# helm values.yaml
configs:
  cm:
    resource.customizations: |
      flink.apache.org/FlinkDeployment:
        health.lua: |
          health_status = {}

          if obj.status ~= nil and obj.status.jobManagerDeploymentStatus ~= nil then
            if obj.status.jobManagerDeploymentStatus == "READY" then
              health_status.status = "Healthy"
              return health_status
            end

            if obj.status.jobManagerDeploymentStatus == "DEPLOYED_NOT_READY" or obj.status.jobManagerDeploymentStatus == "DEPLOYING" then
              health_status.status = "Progressing"
              health_status.message = "Waiting for deploying"
              return health_status
            end

            if obj.status.jobManagerDeploymentStatus == "ERROR" then
              health_status.status = "Degraded"
              health_status.message = obj.status.reconciliationStatus.error
              return health_status
            end
          end

          health_status.status = "Progressing"
          health_status.message = "Waiting for Flink operator"
          return health_status
myskaludek commented 1 year ago

I add costumization to configmap and i work as expected but will be nice to have good healthcheck in upstream :-)

crenshaw-dev commented 1 year ago

Reopening in case someone wants to pick up the PR and resolve unit test issues.

dlemfh commented 11 months ago

I think https://github.com/argoproj/argo-cd/pull/15065 should also close this issue as well.