Closed bamarch closed 2 years ago
Some extra bit of context
===============
Agent (v7.35.2)
===============
Status date: 2022-05-23 13:33:27.242 UTC (1653312807242)
Agent start: 2022-05-23 13:19:43.72 UTC (1653311983720)
Pid: 31381
Go Version: go1.17.6
Python Version: 3.8.11
Build arch: amd64
Agent flavor: agent
Check Runners: 4
Log Level: DEBUG
Paths
=====
Config File: /etc/datadog-agent/datadog.yaml
conf.d: /etc/datadog-agent/conf.d
checks.d: /etc/datadog-agent/checks.d
Clocks
======
NTP offset: 2.017ms
System time: 2022-05-23 13:33:27.242 UTC (1653312807242)
Host Info
=========
bootTime: 2022-05-17 21:24:26 UTC (1652822666000)
kernelArch: x86_64
kernelVersion: 5.4.0-1077-azure
os: linux
platform: ubuntu
platformFamily: debian
platformVersion: 21.10
procs: 223
uptime: 135h55m24s
virtualizationRole: host
virtualizationSystem: kvm
Hostnames
=========
cluster-name: aks-my-cluster-name
host_aliases: [f4ee4ced-9ddb-459c-b5c9-bf61221abfd9]
hostname: datadog-8xfg9
socket-fqdn: datadog-8xfg9
socket-hostname: datadog-8xfg9
host tags:
cluster_name:aks-my-cluster-name
kube_cluster_name:aks-my-cluster-name
hostname provider: os
unused hostname providers:
aws: not retrieving hostname from AWS: the host is not an ECS instance and other providers already retrieve non-default hostnames
azure: azure_hostname_style is set to 'os'
configuration/environment: hostname is empty
container: Unable to get hostname from container API
gce: unable to retrieve hostname from GCE: GCE metadata API error: status code 400 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname
Metadata
========
agent_version: 7.35.2
cloud_provider: Azure
config_apm_dd_url:
config_dd_url:
config_logs_dd_url:
config_logs_socks5_proxy_address:
config_no_proxy: []
config_process_dd_url:
config_proxy_http:
config_proxy_https:
config_site:
feature_apm_enabled: false
feature_cspm_enabled: false
feature_cws_enabled: false
feature_logs_enabled: true
feature_networks_enabled: false
feature_networks_http_enabled: false
feature_networks_https_enabled: false
feature_otlp_enabled: false
feature_process_enabled: false
feature_processes_container_enabled: true
flavor: agent
hostname_source: os
install_method_installer_version: datadog-2.33.7
install_method_tool: helm
install_method_tool_version: Helm
logs_transport: HTTP
=========
APM Agent
=========
Status: Running
Pid: 31443
Uptime: 832 seconds
Mem alloc: 21,049,136 bytes
Hostname: datadog-8xfg9
Receiver: 0.0.0.0:8126
Endpoints:
https://trace.agent.us3.datadoghq.com
Receiver (previous minute)
==========================
From .NET 6.0.5 (.NET), client 2.1.0.0
Traces received: 73 (393,491 bytes)
Spans received: 264
From .NET 6.0.4 (.NET), client 2.4.4.0
Traces received: 21 (23,047 bytes)
Spans received: 42
Writer (previous minute)
========================
Traces: 0 payloads, 0 traces, 0 events, 0 bytes
Stats: 0 payloads, 0 stats buckets, 0 bytes
=========
Aggregator
=========
Checks Metric Sample: 136,468
Dogstatsd Metric Sample: 16,075
Event: 10
Events Flushed: 10
Number Of Flushes: 54
Series Flushed: 94,401
Service Check: 696
Service Checks Flushed: 741
=========
DogStatsD
=========
Event Packets: 0
Event Parse Errors: 0
Metric Packets: 16,074
Metric Parse Errors: 0
Service Check Packets: 0
Service Check Parse Errors: 0
Udp Bytes: 1,798,394
Udp Packet Reading Errors: 0
Udp Packets: 7,430
Uds Bytes: 373,541
Uds Origin Detection Errors: 0
Uds Packet Reading Errors: 0
Uds Packets: 922
Unterminated Metric Errors: 0
=====================
Datadog Cluster Agent
=====================
- Datadog Cluster Agent endpoint detected: https://10.200.119.172:5005
Successfully connected to the Datadog Cluster Agent.
- Running: 1.19.0+commit.083a221
=============
Autodiscovery
=============
Enabled Features
================
containerd
cri
kubernetes
Hello,
Yes that's because newer Agent versions are built with go 1.17, which dropped support for certificates without SAN entirely (previously we were using x509ignoreCN=0), we need to update the documentation to reflect that issue.
Unfortunately it means we need to force tlsVerify: false
on AKS, there's not much we can do on our side, we already reported this to Azure, but still not fixed on their side.
Hello,
Yes that's because newer Agent versions are built with go 1.17, which dropped support for certificates without SAN entirely (previously we were using x509ignoreCN=0), we need to update the documentation to reflect that issue.
Unfortunately it means we need to force
tlsVerify: false
on AKS, there's not much we can do on our side, we already reported this to Azure, but still not fixed on their side.
Understood thanks for getting back to me about this
Will be great when AKS updates their certificates finally
Cheers!
Documentation at https://docs.datadoghq.com/agent/kubernetes/distributions/?tab=helm#AKS has been updated
Describe what happened:
We are using AKS and configured as per https://docs.datadoghq.com/agent/kubernetes/distributions/?tab=helm#AKS.
This is our config snippet
Upgrading from
2.27.8
to2.32.6
and keeping the configuration the same results in a status failure for "Kubelet".We don't get logs in the Datadog app for this cluster anymore
It looks like the agent was upgraded to
7.35.0
, I've tested with7.34.0
and this issue isn't presentDescribe what you expected:
The "Kubelet" status check would remain working and logs would continue being visible in the Datadog app
Steps to reproduce the issue:
1: Use AKS
2: Configure the helm chart as per the docs from https://docs.datadoghq.com/agent/kubernetes/distributions/?tab=helm#AKS for using the support added in https://github.com/DataDog/helm-charts/pull/195
Additional environment details (Operating System, Cloud provider, etc):
AKS cluster Using private networking with public DNS
The error message itself states "Error: Unable to detect the kubelet URL automatically: impossible to reach Kubelet with host: aks-usrhighmem02-12168730-vmss000002. Please check if your setup requires kubelet_tls_verify = false. Activate debug logs to see all attempts made"
Adding
kubelet.tlsVerify: false
to the chart values does fix the issue, so we aren't blocked and there is a workaround (i.e. accept the slightly weakened security posture)I'm mainly wondering if support has dropped for the method of using
hostCAPath
to fix this issue, and whether documentation could do with updating if that's true? Or something in the changelog to warn people upgrading. Otherwise maybe it is an unintentional regression in the underlyingkubelet.py
core integration