argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.87k stars 5.45k forks source link

inconsistent log output format #5715

Open seankhliao opened 3 years ago

seankhliao commented 3 years ago

Checklist:

Describe the bug

k8s.io/client-go outputs logs through k8s.io/klog{,/v2} which is a different format (glog style) than argocd (logfmt or json), making it annoying to parse, example below

Something else outputs plaintext

To Reproduce

Not sure how to reliably generate errors in client-go

Expected behavior

logs output in consistent structured format

Version

argocd: v1.8.7+eb3d1fb
  BuildDate: 2021-03-03T07:12:02Z
  GitCommit: eb3d1fb84b9b77cdffd70b14c4f949f1c64a9416
  GitTreeState: clean
  GoVersion: go1.14.12
  Compiler: gc
  Platform: linux/amd64

Logs

From argocd-application-controller

time="2021-03-04T09:41:42Z" level=info msg="Ignore status for CustomResourceDefinitions"
time="2021-03-04T09:41:42Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: xxx)" application=xxx
E0304 09:41:42.208565       1 retrywatcher.go:130] Watch failed: Get "https://10.250.0.1:443/apis/status.gatekeeper.sh/v1beta1/constraintpodstatuses?allowWatchBookmarks=true&resourceVersion=497293846&watch=true": context canceled
E0304 09:41:42.211088       1 retrywatcher.go:130] Watch failed: Get "https://10.250.0.1:443/apis/networking.k8s.io/v1/networkpolicies?allowWatchBookmarks=true&resourceVersion=497293845&watch=true": context canceled
E0304 09:41:42.211871       1 retrywatcher.go:130] Watch failed: Get "https://10.250.0.1:443/apis/storage.k8s.io/v1/storageclasses?allowWatchBookmarks=true&resourceVersion=497293833&watch=true": context canceled
E0304 09:41:42.214747       1 retrywatcher.go:130] Watch failed: Get "https://10.250.0.1:443/apis/velero.io/v1/restores?allowWatchBookmarks=true&resourceVersion=497293835&watch=true": context canceled
E0304 09:41:42.214779       1 retrywatcher.go:130] Watch failed: Get "https://10.250.0.1:443/apis/acme.cert-manager.io/v1/orders?allowWatchBookmarks=true&resourceVersion=497293881&watch=true": context canceled
time="2021-03-04T09:41:54Z" level=info msg="Skipping auto-sync: application status is Unknown" application=xxx
time="2021-03-04T09:41:54Z" level=info msg="Update successful" application=xxx

From argocd-server

Failed to obtain reader, failed to marshal fields to JSON, json: error calling MarshalJSON for type *grpc.jsonpbMarshalleble: unexpected end of JSON input
{"error":"rpc error: code = Unauthenticated desc = invalid session: oidc: token is expired (Token Expiry: 2021-03-03 14:34:10 +0000 UTC)","grpc.code":"Unauthenticated","grpc.method":"List","grpc.service":"cluster.ClusterService","grpc.start_time":"2021-03-04T09:29:11Z","grpc.time_ms":1.379,"level":"info","msg":"finished unary call with code Unauthenticated","span.kind":"server","system":"grpc","time":"2021-03-04T09:29:11Z"}
Failed to obtain reader, failed to marshal fields to JSON, json: error calling MarshalJSON for type *grpc.jsonpbMarshalleble: invalid character '\x00' in string literal
lanmarti commented 1 year ago

The inconsistent log output is causing issues with our EFK stack. When trying to parse the log messages as logfmt, the intermittent go messages confuse fluentbit. It detects various words in the go message as their own new fields, causing an explosion in the corresponding Elasticsearch index mapping. E.g log message: Trace[2017359215]: ---"Objects listed" error:Get "https://10.1.128.1:443/apis/argoproj.io/v1alpha1/namespaces/argocd/applications?resourceVersion=327200165": dial tcp 10.1.128.1:443: i/o timeout 30001ms (21:05:51.063) generates the following 8 nonsensical fields:

log_processed.--- true
log_processed.: true
log_processed.(21:05:51_063) true
log_processed.10_1_128_1:443: true
log_processed.30001ms true
log_processed.dial true
log_processed.error:Get true
log_processed.https://10_1_128_1:443/apis/argoproj_io/v1alpha1/namespaces/argocd/applications?resourceVersion 327200165

Not sure how to reliably generate errors in client-go

I generally see go messages when the control plane is down, so temporarily taking down the control plane might be a way to generate them.

Setting the log format to JSON does not affect the go messages, and therefore can't be used to solve the issue.

crdrost commented 5 months ago

I had to rewrite this from scratch off-company-time, so I haven't tested to see if it works, but here's a workaround to parse and regex if you're writing your own stream processors in Go:

https://gist.github.com/crdrost/ce08b2449d438a2c3b18fe64cad39095

I agree though that the stderr output should ideally dump as level=ERROR messages in logfmt.

crdrost commented 5 months ago

Implementors might also be interested in this tidbit I saw while looking into the problem, this injects Logrus as a gRPC v2 logger which I think would fix the "Failed to obtain reader, failed to marshal fields to JSON" error?

https://github.com/grpc-ecosystem/go-grpc-middleware/blob/v1.4.0/logging/logrus/grpclogger.go#L13

Obviously the retrywatcher.go:130 also needs to be hunted down and probably separately, and the "Objects listed" error I don't have any insight on. We on my team have also seen messages like

time="2024-05-09T22:29:00Z" level=info msg="received streaming call /application.ApplicationService/WatchResourceTree" grpc.method=WatchResourceTree grpc.request.claims="<omitted b/c vaguely sensitive-looking>" grpc.request.content="applicationName:\"home\" appNamespace:\"argocd\" " grpc.service=application.ApplicationService grpc.start_time="2024-05-09T22:29:00Z" span.kind=server system=grpc
2024/05/16 21:35:20 proto: tag has too few fields: "-"
time="2024-05-09T22:29:03Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=WatchResourceTree grpc.service=application.ApplicationService grpc.start_time="2024-05-09T22:26:25Z" grpc.time_ms=158234.83 span.kind=server system=grpc

and, presumably with breaking a Terminal connection because that's where websockets are used,

time="2024-05-09T22:30:24Z" level=error msg="read message err: websocket: close 1006 (abnormal closure): unexpected EOF"
E0509 22:30:24.821155       7 v2.go:105] websocket: close 1006 (abnormal closure): unexpected EOF

note that this is also a weird log duplicate as far as I can tell?