Open seankhliao opened 3 years ago
The inconsistent log output is causing issues with our EFK stack. When trying to parse the log messages as logfmt, the intermittent go messages confuse fluentbit. It detects various words in the go message as their own new fields, causing an explosion in the corresponding Elasticsearch index mapping.
E.g log message: Trace[2017359215]: ---"Objects listed" error:Get "https://10.1.128.1:443/apis/argoproj.io/v1alpha1/namespaces/argocd/applications?resourceVersion=327200165": dial tcp 10.1.128.1:443: i/o timeout 30001ms (21:05:51.063)
generates the following 8 nonsensical fields:
log_processed.--- true
log_processed.: true
log_processed.(21:05:51_063) true
log_processed.10_1_128_1:443: true
log_processed.30001ms true
log_processed.dial true
log_processed.error:Get true
log_processed.https://10_1_128_1:443/apis/argoproj_io/v1alpha1/namespaces/argocd/applications?resourceVersion 327200165
Not sure how to reliably generate errors in client-go
I generally see go messages when the control plane is down, so temporarily taking down the control plane might be a way to generate them.
Setting the log format to JSON does not affect the go messages, and therefore can't be used to solve the issue.
I had to rewrite this from scratch off-company-time, so I haven't tested to see if it works, but here's a workaround to parse and regex if you're writing your own stream processors in Go:
https://gist.github.com/crdrost/ce08b2449d438a2c3b18fe64cad39095
I agree though that the stderr output should ideally dump as level=ERROR messages in logfmt.
Implementors might also be interested in this tidbit I saw while looking into the problem, this injects Logrus as a gRPC v2 logger which I think would fix the "Failed to obtain reader, failed to marshal fields to JSON" error?
https://github.com/grpc-ecosystem/go-grpc-middleware/blob/v1.4.0/logging/logrus/grpclogger.go#L13
Obviously the retrywatcher.go:130
also needs to be hunted down and probably separately, and the "Objects listed" error
I don't have any insight on. We on my team have also seen messages like
time="2024-05-09T22:29:00Z" level=info msg="received streaming call /application.ApplicationService/WatchResourceTree" grpc.method=WatchResourceTree grpc.request.claims="<omitted b/c vaguely sensitive-looking>" grpc.request.content="applicationName:\"home\" appNamespace:\"argocd\" " grpc.service=application.ApplicationService grpc.start_time="2024-05-09T22:29:00Z" span.kind=server system=grpc
2024/05/16 21:35:20 proto: tag has too few fields: "-"
time="2024-05-09T22:29:03Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=WatchResourceTree grpc.service=application.ApplicationService grpc.start_time="2024-05-09T22:26:25Z" grpc.time_ms=158234.83 span.kind=server system=grpc
and, presumably with breaking a Terminal connection because that's where websockets are used,
time="2024-05-09T22:30:24Z" level=error msg="read message err: websocket: close 1006 (abnormal closure): unexpected EOF"
E0509 22:30:24.821155 7 v2.go:105] websocket: close 1006 (abnormal closure): unexpected EOF
note that this is also a weird log duplicate as far as I can tell?
Checklist:
argocd version
.Describe the bug
k8s.io/client-go
outputs logs throughk8s.io/klog{,/v2}
which is a different format (glog style) thanargocd
(logfmt or json), making it annoying to parse, example belowSomething else outputs plaintext
To Reproduce
Not sure how to reliably generate errors in client-go
Expected behavior
logs output in consistent structured format
Version
Logs
From
argocd-application-controller
From
argocd-server