Open containerkid opened 2 years ago
This is also happening on my GKE autopilot cluster version 1.22.12-gke.2300.
I just ran into this EXACT same issue (liveness and readiness HTTP probes were returning were both returning a statuscode 500). When I checked my 'agent' pod logs, I could see 403's while trying to hit 'agent-http-intake.logs.datadoghq.com'. This lead me down the google rabbit hole and I found this article: https://docs.datadoghq.com/getting_started/site/ and that was my issue. Since my portal URL was pointing to us5.datadoghq.com, I had to change my values.yaml datadog.site value to match.
Set the clusterName and site in values.yaml and upgrade using helm
Ex: helm upgrade
I just ran into this EXACT same issue (liveness and readiness HTTP probes were returning were both returning a statuscode 500). When I checked my 'agent' pod logs, I could see 403's while trying to hit 'agent-http-intake.logs.datadoghq.com'. This lead me down the google rabbit hole and I found this article: https://docs.datadoghq.com/getting_started/site/ and that was my issue. Since my portal URL was pointing to us5.datadoghq.com, I had to change my values.yaml datadog.site value to match.
Hi ddaktn, did you solve the issue? the liveness always fails on my side with connection refused to port 5555 and 8126
Did anybody figure this out? Still happening on GKE Autopilot 😟
I just ran into this EXACT same issue (liveness and readiness HTTP probes were returning were both returning a statuscode 500). When I checked my 'agent' pod logs, I could see 403's while trying to hit 'agent-http-intake.logs.datadoghq.com'. This lead me down the google rabbit hole and I found this article: https://docs.datadoghq.com/getting_started/site/ and that was my issue. Since my portal URL was pointing to us5.datadoghq.com, I had to change my values.yaml datadog.site value to match.
Hi ddaktn, did you solve the issue? the liveness always fails on my side with connection refused to port 5555 and 8126
Yes, changing it to point to the write URL did resolve my issue.
Describe what happened: I am trying to install datadog on my GKE cluster and after updating the values.yaml i tried to run helm install command , and i can see some pods are running and some are in the status of 2/3 , when i did the pod describe i saw an error saying "Readiness probe failed: HTTP probe failed with statuscode: 500"
Describe what you expected:
i am expecting all the pods to be running and ready status
Steps to reproduce the issue:
Additional environment details (Operating System, Cloud provider, etc): Google cloud GKE version
1.23.8-gke.1900 --