Azure / orkestra

Orkestra is a cloud-native release orchestration and lifecycle management (LCM) platform for the fine-grained orchestration of inter-dependent helm charts and their dependencies
https://azure.github.io/orkestra
Other
105 stars 16 forks source link

Workflow moves ahead even if certain HelmReleases are not yet Ready #232

Closed nitishm closed 3 years ago

nitishm commented 3 years ago

Describe the bug A clear and concise description of what the bug is. The Workflow keeps moving forward even if some HelmRelease objects generated by the workflow nodes are in READY: Unknown status

│ fed-grafana                              fg-pod-grafana                                    True                          Release reconciliation succeeded                       2m2s                         │
│ fed-grafana                              fg-pod-istio-gw                                   Unknown                       Reconciliation in progress                             2m2s                         │
│ fed-prometheus                           fp-pod-prometheus                                 Unknown                       Reconciliation in progress                             2m2s                         │
│ fed-prometheus                           fp-pod-maint                                      True                          Release reconciliation succeeded                       2m36s                        │
│ fed-grafana                              fg-pod-maint                                      True                          Release reconciliation succeeded                       2m37s                        │
│ fed-grafana                              fed-grafana                                       True                          Release reconciliation succeeded                       3m2s                         │
│ fed-prometheus                           fed-prometheus                                    True                          Release reconciliation succeeded                       3m2s                         │
│ fed-kubedb-operator                      fko-pod-kubedb-operator                           True                          Release reconciliation succeeded                       3m20s                        │
│ fed-kubedb-operator                      fko-pod-maint                                     True                          Release reconciliation succeeded                       3m30s                        │
│ fed-kubedb-operator                      fed-kubedb-operator                               True                          Release reconciliation succeeded                       3m39s                        │
│ fed-paas-helpers                         fed-paas-helpers                                  False                         install retries exhausted                              3m50s                        │
│ fed-paas-helpers                         fph-pod-jaeger-agent                              True                          Release reconciliation succeeded                       3m56s                        │
│ fed-paas-helpers                         fph-pod-cert-ctrl                                 True                          Release reconciliation succeeded                       3m57s                        │
│ fed-paas-helpers                         fph-pod-mongo-cfg-op                              True                          Release reconciliation succeeded                       3m57s                        │
│ fed-service-reg                          fsr-pod-service-reg                               True                          Release reconciliation succeeded                       3m58s                        │
│ fed-paas-helpers                         fph-pod-capture-mgr                               True                          Release reconciliation succeeded                       4m9s                         │
│ fed-paas-helpers                         fph-pod-capture-ss                                True                          Release reconciliation succeeded                       4m9s                         │
│ fed-paas-helpers                         fph-pod-grafana-operator                          True                          Release reconciliation succeeded                       4m9s                         │
│ fed-paas-helpers                         fph-pod-istio-config                              True                          Release reconciliation succeeded                       4m9s                         │
│ fed-paas-helpers                         fph-pod-kargo                                     Unknown                       Reconciliation in progress                             4m9s                         │
│ fed-paas-helpers                         fph-pod-cert-manager                              True                          Release reconciliation succeeded                       4m10s                        │
│ fed-paas-helpers                         fph-pod-elastalert-cfg                            True                          Release reconciliation succeeded                       4m10s                        │
│ fed-paas-helpers                         fph-pod-etcd-monitor                              True                          Release reconciliation succeeded                       4m10s

To Reproduce Steps to reproduce the behavior: Intermittent in a private ApplicationGroup

Expected behavior A clear and concise description of what you expected to happen. Executor should wait for HelmRelease to be READY: True before returning success

Screenshots If applicable, add screenshots to help explain your problem.

Screen Shot 2021-05-13 at 2 35 56 PM

Environment (please complete the following information):

Additional context Add any other context about the problem here.

@jonathan-innis - we think it might be related to toggling conditions (from ready to unknown) because of a bug in helm-controller.

nitishm commented 3 years ago

@jonathan-innis What does the executor check for in the HelmRelease before deeming it a success or failure?

jonathan-innis commented 3 years ago

It checks whether the status of the HelmRelease is in a Current state, this should occur only when the Ready condition comes as True and the ReadyReason is succeeded, so it's possible that the HelmRelease goes into this state briefly and then comes out of this state. Would need a repro to really understand what's happening here.

nitishm commented 3 years ago

Let's sync up during the week and I can show you, live, when it happens using our own custom ApplicationGroup