argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.94k stars 5.46k forks source link

Add option to enable logging of failed deployments/syncs to history #6264

Open keithchong opened 3 years ago

keithchong commented 3 years ago

Summary

Although the current history and rollback page shows all the completed deployments so you can then rollback, it will also be beneficial if the history includes the "failed" attempts.

Definition of an unsuccessful deployment (which can be expanded upon or modified). This is what I have observed:

I'd like to see the above to be available in the history (after a sync is initiated), but they can be 'filtered out' for the rollback use case, for obvious reasons.

Perhaps each saved failed attempt does not have to require too much space. At minimum, we already have the start time, which is saved here asdeployStartedAt: https://github.com/argoproj/argo-cd/blob/a1419c227656854f0e11280ec4a673e5e7f985d8/pkg/apis/application/v1alpha1/types.go#L934

Motivation

It will be useful to track failures as well as successes for statistical purposes.

Proposal

TBD

jessesuen commented 3 years ago

We might want to accomplish this feature via SyncJob CRD: https://github.com/argoproj/argo-cd/issues/1283. since the sync record could be kept with the object itself.

keithchong commented 3 years ago

Thanks @jessesuen

jannfis commented 3 years ago

We might want to accomplish this feature via SyncJob CRD: #1283. since the sync record could be kept with the object itself.

Hm. Since #1283 covers only manual syncs, how would we record information from automatic syncs?

jannfis commented 3 years ago

I was wondering, can we just emit events for those things?

We already emit events for certain things, so we might look whether we can emit events on failed syncs, recording the most basic information in them.

Someone interested in the history and states of sync could just retrieve the events and read required information from them.

keithchong commented 3 years ago

Note that the history or events can't be transient, and need to be available 'on demand' at any time. The info probably needs to be stored somewhere.

keithchong commented 3 years ago

One additional requirement for this enhancement is to provide the deployment history for the last 30 days, at a minimum. In theory, if you implement it for 30 days, and the client request is for 14 days of history, then it shouldn't be too much additional work to support that.