Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.93k stars 297 forks source link

[BUG] AKS AMA Agend Insecure default configuration #3812

Open floriankoch opened 1 year ago

floriankoch commented 1 year ago

Describe the bug The AMA Agent on AKS per default collects the Environment Variable content for containers https://learn.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-agent-config#configmap-file-settings-overview

On Kubernetes (and so AKS) Secrets are stored in Environment Variables, so with the default AMA configuration, Secrets are exposed in Logfiles

Its even worse, if you find this documentation and disables this via the configmap, and make a mistake that happens to be a syntax error in the configuration (which is toml in yaml , no real chance to validate), the AMA Agent ONLY logs this problem and starts with the insecure default.

Expected behavior

Changing the insecure Default from true to false for collection Environment Variables Fail the AMA agent when there is a Syntax error in the user provide configuration

Environment:

ghost commented 1 year ago

@vishiy, @saaror would you be able to assist?

Issue Details
**Describe the bug** The AMA Agent on AKS per default collects the Environment Variable content for containers https://learn.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-agent-config#configmap-file-settings-overview On Kubernetes (and so AKS) Secrets are stored in Environment Variables, so with the default AMA configuration, Secrets are exposed in Logfiles Its even worse, if you find this documentation and disables this via the configmap, and make a mistake that happens to be a syntax error in the configuration (which is toml in yaml , no change to validate), the AMA Agent ONLY logs this problem and starts with the insecure default. **Expected behavior** Changing the insecure Default from true to false for collection Environment Variables Fail the AMA agent when there is a Syntax error in the user provide configuration **Environment:** - Kubernetes version [e.g. 1.25.6]
Author: floriankoch
Assignees: -
Labels: `bug`, `addon/container-insights`
Milestone: -
ganga1980 commented 1 year ago

Hi, @floriankoch, are you asking both, or just (2) ? (1) make the default from true to false (2) fail the agent if the config has errors

For (1), we cant change the default from true to false and this will impact the existing customers and certainly, we can expose this as part of the onboarding (through checkbox) so that customer can choose whether to collect or not.

For (2), The current behavior in the agent what we have is, if any of the config is incorrect and we take standard defaults and proceed. Agree with your feedback on failing the agent if the config is incorrect and we will evaluate this, and address this feedback in our subsequent agent releases. can you please share the snippet of the ConfigMap with syntax error to what's the syntax error you had in ConfigMap?

floriankoch commented 1 year ago

Hi @ganga1980,

from my point of view, both should be fixed

(1) Yes i see, its a breaking change - but i think config should be secure by default. Breaking changes are bad and should be avoided, in this case i think its worth

(2) Here is the configmap with the error i faced (see the extra ]] on the excluded namespace line)

apiVersion: v1
data:
  config-version: ver1
  log-data-collection-settings: |-
    [log_collection_settings]
       [log_collection_settings.stdout]
          enabled = false
          exclude_namespaces = []]]
       [log_collection_settings.stderr]
          enabled = false
          exclude_namespaces = []
      #Security related - do not turn on
       [log_collection_settings.env_var]
          enabled = false
       [log_collection_settings.enrich_container_logs]
          enabled = false
       [log_collection_settings.collect_all_kube_events]
          enabled = false
  metric_collection_settings: |-
    [metric_collection_settings.collect_kube_system_pv_metrics]
      enabled = false
  prometheus-data-collection-settings: |-
    # Custom Prometheus metrics data collection settings
    [prometheus_data_collection_settings.cluster]
        kubernetes_services = [""]
        monitor_kubernetes_pods = false
        # monitor_kubernetes_pods_namespaces = ["default1"]
  schema-version: v1
kind: ConfigMap
metadata:
  annotations: {}
  labels:
    name: container-azm-ms-agentconfig
  name: container-azm-ms-agentconfig
  namespace: kube-system
ganga1980 commented 11 months ago

Hi, @floriankoch, We have plan to address (1) as part of next semester work and (2), we will improve the error messaging in our sept agent release.

floriankoch commented 11 months ago

Hi @ganga1980 thx

floriankoch commented 4 months ago

@ganga1980 anything news here?

floriankoch commented 2 months ago

@ganga1980 @kaarthis @joby-thomas

anything news here?

vdiec commented 1 month ago

Hi @floriankoch, we are looking to address (1) in August and (2) has currently rolling out (eta end of June).

floriankoch commented 1 month ago

@vdiec thank you for the info

vdiec commented 2 weeks ago

Hi @floriankoch , for (2), a fix to improve the error messaging has been rolled out. For (1), work is in progress and I will update this thread once it's complete.