DataDog / datadog-operator

Kubernetes Operator for Datadog Resources
Apache License 2.0
302 stars 104 forks source link

Allow configuration of network host monitoring via CRD #177

Closed andysnowden closed 1 year ago

andysnowden commented 3 years ago

Describe what happened: The operator deployed the datadog-agent to my kubernetes nodes with the network monitoring flag enabled.

Describe what you expected: A option to be present to disable this feature like with logs/apm/etc.

Steps to reproduce the issue:

  1. Install the operator like normal
  2. Deploy the agent with any of the example configs
  3. See network collection enabled

Additional environment details (Operating System, Cloud provider, etc): Kubernetes 1.17.9 and 1.16.9 Datadog Agent: v7.23.1 Cluster Agent: 1.9.1+commit.2270e4d Operator 0.3.1

clamoriniere commented 3 years ago

Hi @andysnowden,

Thank your for reporting this issue. We will work on a fix soon.

in the meantime I can suggest you to add the environment variable DD_SYSTEM_PROBE_NETWORK_ENABLED set to false to deactivate the feature.

apiVersion: datadoghq.com/v1alpha1
kind: DatadogAgent
metadata:
  name: foo
spec:
  credentials:
    //...
  agent:
    //...
    config:
      env:
      - name: DD_SYSTEM_PROBE_NETWORK_ENABLED
         value: "false"

Let me know if this workaround works for you.

andysnowden commented 3 years ago

I added that env var to both the agent and sysprobe config and either seem to have the desired effect. I'm still seeing the hosts show up in the network map and in our plan usage page.

apiVersion: datadoghq.com/v1alpha1
kind: DatadogAgent
metadata:
  name: datadog
spec:
  credentials:
    //...
  agent:
    config:
      collectEvents: true
      leaderElection: true
      resources:
        limits:
          memory: 350Mi
        requests:
          memory: 350Mi
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
          operator: Exists
      env:
        - name: DD_SYSTEM_PROBE_NETWORK_ENABLED
          value: "false"
    image:
      name: "datadog/agent:latest"
    apm:
      enabled: false
    process:
      enabled: true
    log:
      enabled: false
    systemProbe:
      enabled: true
      bpfDebugEnabled: true
      env:
        - name: DD_SYSTEM_PROBE_NETWORK_ENABLED
          value: "false"
    security:
      compliance:
        enabled: true
      runtime:
        enabled: false
  clusterAgent:
    image:
      name: "datadog/cluster-agent:latest"
    replicas: 2
    config:
      externalMetrics:
        enabled: true
      admissionController:
        enabled: true

Is there a agent command I can run to tell if it's enabled/disabled?

clamoriniere commented 3 years ago

which version of the agent are you using? this new setting (DD_SYSTEM_PROBE_NETWORK_ENABLED) was introduced in 7.23.0 I think.

for the command: agent config (in the container agent) should give you the current configuration. we don't have the same command for the system-probe. You can also generate a flare bundle agent flare to get all information in a zip file.

andysnowden commented 3 years ago

I'm running v7.23.1

here's the output of the config

ac_exclude: []
ac_include: []
ac_load_timeout: 30000
ad_config_poll_interval: 10
additional_checksd: /etc/datadog-agent/checks.d
additional_endpoints: {}
admission_controller:
  certificate:
    expiration_threshold: 720
    secret_name: webhook-certificate
    validity_bound: 8760
  enabled: false
  inject_config:
    enabled: true
    endpoint: /injectconfig
  inject_tags:
    enabled: true
    endpoint: /injecttags
  mutate_unlabelled: false
  pod_owners_cache_validity: 10
  port: 8000
  service_name: datadog-admission-controller
  webhook_name: datadog-webhook
aggregator_buffer_size: 100
aggregator_stop_timeout: 2
api_key: ***************************16e13
apm_config:
  apm_non_local_traffic: true
  enabled: true
  max_cpu_percent: 0
  max_memory: 0
  receiver_port: 8126
app_key: ""
auth_token_file_path: ""
autoconf_template_dir: /datadog/check_configs
autoconf_template_url_timeout: 5
bind_host: localhost
bosh_id: ""
c_core_dump: false
c_stacktrace_collection: false
cache_sync_timeout: 2
cf_os_hostname_aliasing: false
check_runners: 4
checks_tag_cardinality: low
clc_runner_enabled: false
clc_runner_host: ""
clc_runner_port: 5005
clc_runner_server_readheader_timeout: 10
clc_runner_server_write_timeout: 15
cloud_foundry: false
cloud_foundry_bbs:
  ca_file: ""
  cert_file: ""
  key_file: ""
  poll_interval: 15
  url: https://bbs.service.cf.internal:8889
cloud_foundry_garden:
  listen_address: /var/vcap/data/garden/garden.sock
  listen_network: unix
cloud_provider_metadata:
- aws
- gcp
- azure
- alibaba
cluster_agent:
  auth_token: ********
  cmd_port: 5005
  enabled: true
  kubernetes_service_name: datadog-cluster-agent
  tagging_fallback: false
  url: ""
cluster_checks:
  advanced_dispatching_enabled: false
  clc_runners_port: 5005
  cluster_tag_name: cluster_name
  enabled: false
  extra_tags: []
  node_expiration_timeout: 30
  warmup_duration: 30
cluster_name: ""
cmd.check.fullsketches: false
cmd_host: localhost
cmd_port: 5001
collect_ec2_tags: false
collect_gce_tags: true
collect_kubernetes_events: true
compliance_config:
  check_interval: 20m0s
  dir: /etc/datadog-agent/compliance.d
  enabled: false
  run_path: /opt/datadog-agent/run
conf_path: .
confd_path: /etc/datadog-agent/conf.d
config_providers:
- name: kubelet
  polling: true
- name: docker
  poll_interval: 1s
  polling: true
container_cgroup_prefix: ""
container_cgroup_root: /host/sys/fs/cgroup/
container_exclude: []
container_exclude_logs: []
container_exclude_metrics: []
container_include: []
container_include_logs: []
container_include_metrics: []
container_proc_root: /host/proc
containerd_namespace: k8s.io
cri_connection_timeout: 1
cri_query_timeout: 5
cri_socket_path: ""
default_integration_http_timeout: 9
disable_cluster_name_tag_key: false
disable_file_logging: false
disable_py3_validation: false
disable_unsafe_yaml: true
docker_env_as_tags: {}
docker_labels_as_tags: {}
docker_query_timeout: 5
dogstatsd_buffer_size: 8192
dogstatsd_disable_verbose_logs: false
dogstatsd_entity_id_precedence: false
dogstatsd_expiry_seconds: 300
dogstatsd_mapper_cache_size: 1000
dogstatsd_metrics_stats_enable: false
dogstatsd_non_local_traffic: false
dogstatsd_origin_detection: false
dogstatsd_packet_buffer_flush_timeout: 100ms
dogstatsd_packet_buffer_size: 32
dogstatsd_port: 8125
dogstatsd_queue_size: 1024
dogstatsd_so_rcvbuf: 0
dogstatsd_socket: ""
dogstatsd_stats_buffer: 10
dogstatsd_stats_enable: false
dogstatsd_stats_port: 5000
dogstatsd_string_interner_size: 4096
dogstatsd_tag_cardinality: low
dogstatsd_tags: []
dogstatsd_windows_pipe_name: ""
ec2_metadata_timeout: 300
ec2_metadata_token_lifetime: 21600
ec2_prefer_imdsv2: false
ec2_use_windows_prefix_detection: false
ecs_agent_container_name: ecs-agent
ecs_agent_url: ""
ecs_collect_resource_tags_ec2: false
eks_fargate: false
enable_events_stream_payload_serialization: true
enable_gohai: true
enable_metadata_collection: true
enable_payloads:
  events: true
  json_to_v1_intake: true
  series: true
  service_checks: true
  sketches: true
enable_service_checks_stream_payload_serialization: true
enable_stream_payload_serialization: true
exclude_gce_tags:
- kube-env
- kubelet-config
- containerd-configure-sh
- startup-script
- shutdown-script
- configure-sh
- sshKeys
- ssh-keys
- user-data
- cli-cert
- ipsec-cert
- ssl-cert
- google-container-manifest
- bosh_settings
- windows-startup-script-ps1
- common-psm1
- k8s-node-setup-psm1
- serial-port-logging-enable
- enable-oslogin
- disable-address-manager
- disable-legacy-endpoints
- windows-keys
- kubeconfig
exclude_pause_container: true
expvar_port: "5000"
external_metrics:
  aggregator: avg
external_metrics_provider:
  batch_window: 10
  bucket_size: 300
  config: {}
  enabled: false
  local_copy_refresh_rate: 30
  max_age: 120
  port: 443
  refresh_period: 30
  rollup: 30
  use_datadogmetric_crd: false
  wpa_controller: false
extra_config_providers: []
extra_listeners: []
flare_stripped_keys: []
force_tls_12: false
forwarder_apikey_validation_interval: 60
forwarder_backoff_base: 2
forwarder_backoff_factor: 2
forwarder_backoff_max: 64
forwarder_connection_reset_interval: 0
forwarder_num_workers: 1
forwarder_recovery_interval: 2
forwarder_recovery_reset: false
forwarder_retry_queue_max_size: 30
forwarder_stop_timeout: 2
forwarder_timeout: 20
gce_metadata_timeout: 1000
gce_send_project_id_tag: false
gui_port: -1
health_port: 5555
heroku_dyno: false
histogram_aggregates:
- max
- median
- avg
- count
histogram_copy_to_distribution: false
histogram_copy_to_distribution_prefix: ""
histogram_percentiles:
- "0.95"
hostname: ""
hostname_force_config_as_canonical: false
hostname_fqdn: false
hpa_configmap_name: datadog-custom-metrics
hpa_watcher_gc_period: 300
hpa_watcher_polling_freq: 10
inventories_enabled: true
inventories_max_interval: 600
inventories_min_interval: 300
iot_host: false
ipc_address: localhost
jmx_check_period: 15000
jmx_collection_timeout: 60
jmx_custom_jars: []
jmx_max_restarts: 3
jmx_reconnection_thread_pool_size: 3
jmx_reconnection_timeout: 60
jmx_restart_interval: 5
jmx_thread_pool_size: 3
jmx_use_cgroup_memory_limit: false
jmx_use_container_support: true
kube_resources_namespace: ""
kubelet_auth_token_path: ""
kubelet_cache_pods_duration: 5
kubelet_client_ca: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
kubelet_client_crt: ""
kubelet_client_key: ""
kubelet_listener_polling_interval: 5
kubelet_tls_verify: true
kubelet_wait_on_missing_container: 0
kubernetes_apiserver_client_timeout: 10
kubernetes_apiserver_use_protobuf: false
kubernetes_collect_metadata_tags: true
kubernetes_event_collection_timeout: 100
kubernetes_http_kubelet_port: 10255
kubernetes_https_kubelet_port: 10250
kubernetes_informers_resync_period: 300
kubernetes_kubeconfig_path: ""
kubernetes_kubelet_host: 172.20.92.242
kubernetes_kubelet_nodename: ""
kubernetes_map_services_on_ip: false
kubernetes_metadata_tag_update_freq: 60
kubernetes_node_labels_as_tags: {}
kubernetes_pod_annotations_as_tags: "null"
kubernetes_pod_expiration_duration: 900
kubernetes_pod_labels_as_tags: "null"
leader_election: true
leader_lease_duration: "60"
listeners:
- name: kubelet
log_enabled: false
log_file: ""
log_file_max_rolls: 1
log_file_max_size: 10Mb
log_format_json: false
log_level: INFO
log_payloads: false
log_to_console: true
log_to_syslog: false
logging_frequency: 500
logs_config:
  batch_wait: 5
  close_timeout: 60
  compression_level: 6
  connection_reset_interval: 0
  container_collect_all: false
  dd_port: 10516
  dd_url_443: agent-443-intake.logs.datadoghq.com
  dev_mode_use_proto: true
  docker_client_read_timeout: 30
  frame_size: 9000
  k8s_container_use_file: true
  logs_no_ssl: false
  open_files_limit: 100
  run_path: /opt/datadog-agent/run
  socks5_proxy_address: ""
  stop_grace_period: 30
  tagger_warmup_duration: 0
  use_compression: true
  use_http: false
  use_port_443: false
  use_tcp: false
logs_enabled: false
memtrack_enabled: true
metadata_endpoints_max_hostname_size: 255
metrics_port: "5000"
orchestrator_explorer:
  container_scrubbing:
    enabled: true
  enabled: false
proc_root: /proc
process_config:
  enabled: "false"
procfs_path: /host/proc
profiling:
  enabled: false
python_version: "3"
python3_linter_timeout: 120
run_path: /opt/datadog-agent/run
runtime_security_config:
  debug: false
  enable_kernel_filters: true
  enabled: false
  event_server:
    burst: 40
    rate: 10
  policies:
    dir: /etc/datadog-agent/runtime-security.d
  run_path: /opt/datadog-agent/run
  socket: /opt/datadog-agent/run/runtime-security.sock
  syscall_monitor:
    enabled: false
secret_backend_arguments: []
secret_backend_command: ""
secret_backend_command_allow_group_exec_perm: false
secret_backend_output_max_size: 1048576
secret_backend_timeout: 5
security_agent:
  cmd_port: 5010
  expvar_port: 5011
  log_file: /var/log/datadog/security-agent.log
serializer_max_payload_size: 2621440
serializer_max_uncompressed_payload_size: 4194304
server_timeout: 15
skip_ssl_validation: false
snmp_traps_config:
  bind_host: localhost
  community_strings: []
  port: 162
  stop_timeout: 5
snmp_traps_enabled: false
statsd_forward_host: ""
statsd_forward_port: 0
statsd_metric_namespace: ""
statsd_metric_namespace_blacklist:
- datadog.agent
- datadog.dogstatsd
- datadog.process
- datadog.trace_agent
- datadog.tracer
- activemq
- activemq_58
- airflow
- cassandra
- confluent
- hazelcast
- hive
- ignite
- jboss
- jvm
- kafka
- presto
- sidekiq
- solr
- tomcat
- runtime
syslog_key: ""
syslog_pem: ""
syslog_rfc: false
syslog_tls_verify: true
syslog_uri: ""
tag_value_split_separator: {}
tags: []
telemetry:
  enabled: false
tracemalloc_blacklist: ""
tracemalloc_debug: false
tracemalloc_whitelist: ""
use_dogstatsd: true
use_v2_api:
  events: false
  series: false
  service_checks: false
windows_use_pythonpath: false
andysnowden commented 3 years ago

@clamoriniere do you have any other suggestions? Thanks.

celenechang commented 1 year ago

Hi @andysnowden , apologies that your last comments went unanswered. As we have since released new versions that should fix this issue, I will close it, but feel free to open a new issue if needed.