DataDog / integrations-core

Core integrations of the Datadog Agent
BSD 3-Clause "New" or "Revised" License
909 stars 1.39k forks source link

datadog - rke clusters - Error kube-api #7772

Open thiagoleitetr opened 3 years ago

thiagoleitetr commented 3 years ago
=== Running Kubernetes API Server availability diagnosis ===
[ERROR] error: temporary failure in apiserver, will retry later: check resources failed: event collection: "Get \"https://10.43.0.1:443/api/v1/events?
limit=1&timeout=10s&timeoutSeconds=10\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)", se
rvice collection: "Get \"https://10.43.0.1:443/api/v1/services?limit=1&timeout=10s&timeoutSeconds=10\": net/http: request canceled while waiting for c
onnection (Client.Timeout exceeded while awaiting headers)", pod collection: "Get \"https://10.43.0.1:443/api/v1/pods?limit=1&timeout=10s&timeoutSecon
ds=10\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)", node collection: "Get \"https://10
.43.0.1:443/api/v1/nodes?limit=1&timeout=10s&timeoutSeconds=10\": net/http: request canceled while waiting for connection (Client.Timeout exceeded whi
le awaiting headers)" - 1602540077111700154
===> FAIL

Onpremise centos 7 kubernetes 1.18.8 rke:

  kubelet:                 
    image: ""          
    extra_args:      
      read-only-port: 10255
    extra_binds: []
    extra_env: []  
    win_extra_args: {}     
    win_extra_binds: []                                                                                                                               
    win_extra_env: []
    cluster_domain: cluster.local
    infra_container_image: ""
    cluster_dns_server: 10.43.0.10
    fail_swap_on: false
    generate_serving_certificate: false
  kube-api:                
    image: ""          
    extra_args:      
      v: 10                
    extra_binds: []
    extra_env: []  
    win_extra_args: {}     
    win_extra_binds: []                                                                                                                               
    win_extra_env: []
    service_cluster_ip_range: 10.43.0.0/16
    service_node_port_range: "30000-32767"
    pod_security_policy: false    
    always_pull_images: false
    secrets_encryption_config: null                                        
    audit_log: null
    admission_configuration: null
    event_rate_limit: null
network:              
  plugin: calico       
  options: {}        
  mtu: 0                      
  node_selector: {}                                                        
  update_strategy: null
authentication:
  strategy: x509  
  sans: []              
  webhook: null    
addons: ""            
addons_include: []     
kubernetes_version: v1.18.8-rancher1-1                                     
ssh_key_path: ~/.ssh/id_rsa
ssh_cert_path: ""      
ssh_agent_auth: false
authorization:             
  mode: rbac       
  options: {}      
ignore_docker_version: null
private_registries: []                                                                                                                                
ingress:             
  provider: nginx                                                          
  options:                                                                 
     use-forwarded-headers: "true"
     proxy-body-size: "0"    
  node_selector: {}                                                        
  extra_args: {}   
  dns_policy: ""                 
  extra_envs: []          
  extra_volumes: []
  extra_volume_mounts: []
  update_strategy: null
cluster_name: "rke"    

I'm able to list metrics from cluster, but not from kube-api (pods, containers, deploymets and namespaces).

I have a proxy in this env: HTTP_PROXY: http://xxxx:80
HTTPS_PROXY: http://xxxxx:80 NO_PROXY: localhost 127.0.0.1 localaddress xxxx.com 10.204.202.142 cluster.local

I know the communication with kube-api in rke have some particularities, it has local proxy (localhost:6443) in all k8s workers.

any ideas? what I'm missing here or it is not supported?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. Note that this will not be automatically closed, but the notification will remind us to investigate why there's been inactivity.

If you would like this issue to remain open:

  1. Verify that you can still reproduce the issue in the latest version of the integration.
  2. Comment that the issue is still reproducible and include updated details if possible.

Thank you for participating in the Datadog open source community!