[BUG] Facing Too many errors for endpoint 'https://orchestrator.api.datadoghq.com/api/v2 error for datadog cluster agent

pochavan commented 6 months ago

Agent Environment

Cluster Agent version: 7.52.0 Datadog agentversion:7.50.3

Describe what happened:

I have installed datadog operator and agent on my Kubernetes cluster using operator[Deploy an Agent with the Operator] following is my agent yaml file

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
name: datadog
spec:
global:
clusterName: microservice-demo-app
registry: public.ecr.aws/datadog
site: api.datadoghq.com
credentials:
  apiSecret:
    secretName: datadog-secret
    keyName: api-key
features:
logCollection:
  enabled: true
  containerCollectAll: true
orchestratorExplorer: 
  enabled: true
override:
clusterAgent:
  image:
    name: gcr.io/datadoghq/cluster-agent:latest
nodeAgent:
  image:
    name: gcr.io/datadoghq/agent:latest

agent is installed successfully but I am not able to see any data in Kubernetes explorer section (https://app.datadoghq.com/orchestration/overview/cluster)

Getting following error in datadog-cluster-agent pod

2024-03-22 13:10:56 UTC | CLUSTER | INFO | (pkg/clusteragent/admission/controllers/secret/controller.go:78 in Run) | Starting secrets controller for datadog/webhook-certificate
2024-03-22 13:10:56 UTC | CLUSTER | INFO | (client-go@v0.28.6/tools/leaderelection/leaderelection.go:260 in func1) | successfully acquired lease datadog/datadog-leader-election
2024-03-22 13:10:56 UTC | CLUSTER | INFO | (pkg/util/kubernetes/apiserver/leaderelection/leaderelection_engine.go:152 in func1) | New leader "datadog-cluster-agent-67c4bc5bb-ss2vx"
2024-03-22 13:10:56 UTC | CLUSTER | INFO | (pkg/util/kubernetes/apiserver/leaderelection/leaderelection_engine.go:158 in func2) | Started leading as "datadog-cluster-agent-67c4bc5bb-ss2vx"...
2024-03-22 13:10:56 UTC | CLUSTER | INFO | (comp/core/workloadmeta/collectors/internal/kubeapiserver/kubeapiserver.go:138 in startReadiness) | All (2) K8S reflectors synced to workloadmeta
2024-03-22 13:10:57 UTC | CLUSTER | INFO | (pkg/collector/worker/check_logger.go:40 in CheckStarted) | check:orchestrator | Running check...
2024-03-22 13:10:57 UTC | CLUSTER | INFO | (pkg/collector/worker/check_logger.go:40 in CheckStarted) | check:kubernetes_apiserver | Running check...
2024-03-22 13:10:57 UTC | CLUSTER | INFO | (client-go@v0.28.6/rest/request.go:697 in Infof) | Waited for 1.071776711s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/api/v1/persistentvolumes?limit=500&resourceVersion=0
2024-03-22 13:10:58 UTC | CLUSTER | INFO | (pkg/util/kubernetes/apiserver/leaderelection/leaderelection.go:218 in EnsureLeaderElectionRuns) | Leader election running, current leader is "datadog-cluster-agent-67c4bc5bb-ss2vx"
2024-03-22 13:10:58 UTC | CLUSTER | ERROR | (comp/forwarder/defaultforwarder/worker.go:191 in process) | Error while processing transaction: error "404 Not Found" while sending transaction to "https://orchestrator.api.datadoghq.com/api/v2/orch", rescheduling it: "{\"errors\":[\"Not found\"]}"
2024-03-22 13:10:58 UTC | CLUSTER | ERROR | (comp/forwarder/defaultforwarder/worker.go:187 in process) | Too many errors for endpoint 'https://orchestrator.api.datadoghq.com/api/v2/orch': retrying later
2024-03-22 13:10:58 UTC | CLUSTER | ERROR | (comp/forwarder/defaultforwarder/worker.go:187 in process) | Too many errors for endpoint 'https://orchestrator.api.datadoghq.com/api/v2/orch': retrying later
2024-03-22 13:10:58 UTC | CLUSTER | ERROR | (comp/forwarder/defaultforwarder/worker.go:187 in process) | Too many errors for endpoint 'https://orchestrator.api.datadoghq.com/api/v2/orch': retrying later
2024-03-22 13:10:58 UTC | CLUSTER | ERROR | (comp/forwarder/defaultforwarder/worker.go:187 in process) | Too many errors for endpoint 'https://orchestrator.api.datadoghq.com/api/v2/orch': retrying later
2024-03-22 13:10:58 UTC | CLUSTER | ERROR | (comp/forwarder/defaultforwarder/worker.go:187 in process) | Too many errors for endpoint 'https://orchestrator.api.datadoghq.com/api/v2/orch': retrying later
2024-03-22 13:10:58 UTC | CLUSTER | ERROR | (comp/forwarder/defaultforwarder/worker.go:187 in process) | Too many errors for endpoint 'https://orchestrator.api.datadoghq.com/api/v2/orch': retrying later
2024-03-22

Describe what you expected: should able to see all data like pods in Kubernetes explorer section. also logs should be clean

Steps to reproduce the issue:

Additional environment details (Operating System, Cloud provider, etc):

evgeniy4587 commented 3 weeks ago

Have the same issue in my datadog-cluster-agent pods. Any updates on it?

nic-avant commented 5 days ago

I have this issue with cluster agent 7.52.1 and 7.56.2. Sidecar is provisioned in all the namespaces we need, very unsure of what to do with this error

DataDog / datadog-agent

[BUG] Facing Too many errors for endpoint 'https://orchestrator.api.datadoghq.com/api/v2 error for datadog cluster agent #24000