argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.95k stars 5.46k forks source link

Add context to "Watch failed" #14134

Open jsoref opened 1 year ago

jsoref commented 1 year ago

Checklist:

Describe the bug

https://cloud-native.slack.com/archives/C01TSERG0KZ/p1687186844752359?thread_ts=1687185656.068269&cid=C01TSERG0KZ

We see a lot of;

1 retrywatcher.go:130] "Watch failed" err="context canceled"

To Reproduce

Dunno

Expected behavior

Log messages should give enough context for someone reading them to understand what's going on, i.e. "I tried to do x and a watch failed".

Screenshots

Version

{
    "Version": "v2.7.4+a33baa3.dirty",
    "BuildDate": "2023-06-05T19:00:34Z",
    "GitCommit": "a33baa301fe61b899dc8bbad9e554efbc77e0991",
    "GitTreeState": "dirty",
    "GoVersion": "go1.19.6",
    "Compiler": "gc",
    "Platform": "linux/amd64",
    "KustomizeVersion": "v5.0.1 2023-03-14T01:32:48Z",
    "HelmVersion": "v3.11.2+g912ebc1",
    "KubectlVersion": "v0.24.2",
    "JsonnetVersion": "v0.19.1"
}

Logs

Paste any relevant application logs here.
jenting commented 1 year ago

Similar to the issue which is address in the later version.

weslers commented 11 months ago

Having the same issue and running v2.8.4+c279299

woehrl01 commented 9 months ago

Same with v2.10.0+2175939

black-snow commented 6 months ago

v2.11.0+d3f33c0 seeing this without a clue what's happening.

crdrost commented 6 months ago

Note that this line is also one of the lines that's not in the right format, see #5715 for that

andrii-korotkov-verkada commented 2 days ago

Seems like logs for watches have improved a lot https://github.com/argoproj/gitops-engine/blob/847cfc9f8b200e96a70b591a68b9fb385cf2ce56/pkg/cache/cluster.go#L607-L735. I see logs like:

Failed to watch Pod on <address>: Resyncing Pod on <address> due to timeout, retrying in 1s
Start watch Pod on <address>

Those are normal for operation, though perhaps should be debug, not info. What do you think?

jsoref commented 2 days ago

I'm pretty sure I still hit this. I really want someone to help me get the linked PR merged into kubernetes. I do not have the energy to fight the kubernetes project's bots/processes.

andrii-korotkov-verkada commented 2 days ago

I've checked for some changes, and looks like they added a better handling of several error types a few months ago, e.g. https://github.com/kubernetes/kubernetes/blame/475ee33f698334e5b00c58d3bef4083840ec12c5/staging/src/k8s.io/client-go/tools/watch/retrywatcher.go#L133.

jsoref commented 2 days ago

I think my PR ended up w/ merge conflicts and I ran out of energy. But that project is really draining. And not remotely supportive.

andrii-korotkov-verkada commented 2 days ago

I see :(