Cloudwatch agent container insights crashes in K8s EC2 K8E mode but works as expected in K8s on-premise K8OP mode.
I am running cloudwatch agent as daemonset to collect container insight metrics in MicroK8s setup on AWS EC2 instance.
When I configure cwagent to collect kubernetes container insights it starts and shuts down immediately with error
2024-06-17T00:49:16Z I! CWAGENT_LOG_LEVEL is set to "DEBUG"
2024-06-17T00:49:16Z I! Starting AmazonCloudWatchAgent CWAgent/1.300039.0b612 (go1.22.2; linux; amd64) with log file with log target lumberjack
2024-06-17T00:49:16Z I! AWS SDK log level not set
2024-06-17T00:49:17Z I! {"caller":"service@v0.98.0/telemetry.go:47","msg":"Skipping telemetry setup.","address":"","level":"None"}
2024-06-17T00:49:17Z D! {"caller":"extension@v0.98.0/extension.go:165","msg":"Alpha component. May change in the future.","kind":"extension","name":"agenthealth/logs"}
2024-06-17T00:49:17Z D! {"caller":"exporter@v0.98.0/exporter.go:273","msg":"Beta component. May change in the future.","kind":"exporter","data_type":"metrics","name":"awsemf/containerinsights"}
2024-06-17T00:49:17Z D! {"caller":"awsutil@v0.98.0/conn.go:98","msg":"Using proxy address: ","kind":"exporter","data_type":"metrics","name":"awsemf/containerinsights","proxyAddr":""}
2024-06-17T00:49:17Z D! {"caller":"awsutil@v0.98.0/conn.go:194","msg":"Fetch region from commandline/config file","kind":"exporter","data_type":"metrics","name":"awsemf/containerinsights","region":"ap-northeast-1"}
2024-06-17T00:49:17Z D! {"caller":"awsutil@v0.98.0/conn.go:366","msg":"Fallback shared config file(s)","kind":"exporter","data_type":"metrics","name":"awsemf/containerinsights","files":["/.aws/credentials"]}
2024-06-17T00:49:17Z D! {"caller":"awsutil@v0.98.0/conn.go:390","msg":"Using credential from session","kind":"exporter","data_type":"metrics","name":"awsemf/containerinsights","access-key":"XXXXXXXXXXXX","provider":"EnvConfigCredentials"}
2024-06-17T00:49:17Z W! {"caller":"awsemfexporter@v0.98.0/emf_exporter.go:99","msg":"the default value for DimensionRollupOption will be changing to NoDimensionRollupin a future release. See https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/23997 for moreinformation","kind":"exporter","data_type":"metrics","name":"awsemf/containerinsights"}
2024-06-17T00:49:17Z D! {"caller":"processor@v0.98.0/processor.go:301","msg":"Beta component. May change in the future.","kind":"processor","name":"batch/containerinsights","pipeline":"metrics/containerinsights"}
2024-06-17T00:49:17Z I! {"caller":"service@v0.98.0/service.go:143","msg":"Starting CWAgent...","Version":"1.300039.0b612","NumCPU":16}
2024-06-17T00:49:17Z I! {"caller":"extensions/extensions.go:34","msg":"Starting extensions..."}
2024-06-17T00:49:17Z I! {"caller":"extensions/extensions.go:37","msg":"Extension is starting...","kind":"extension","name":"agenthealth/logs"}
2024-06-17T00:49:17Z I! {"caller":"extensions/extensions.go:52","msg":"Extension started.","kind":"extension","name":"agenthealth/logs"}
2024-06-17T00:49:17Z D! {"caller":"awsmiddleware@v0.0.0-20240503173519-cc2b921759f4/helper.go:18","msg":"Configured middleware on AWS client","kind":"exporter","data_type":"metrics","name":"awsemf/containerinsights","middleware":"agenthealth/logs"}
2024-06-17T00:49:17Z D! {"caller":"awsutil@v0.98.0/conn.go:98","msg":"Using proxy address: ","kind":"receiver","name":"awscontainerinsightreceiver","data_type":"metrics","proxyAddr":""}
2024-06-17T00:49:17Z D! {"caller":"awsutil@v0.98.0/conn.go:194","msg":"Fetch region from commandline/config file","kind":"receiver","name":"awscontainerinsightreceiver","data_type":"metrics","region":"ap-northeast-1"}
2024-06-17T00:49:17Z D! {"caller":"awsutil@v0.98.0/conn.go:366","msg":"Fallback shared config file(s)","kind":"receiver","name":"awscontainerinsightreceiver","data_type":"metrics","files":["/.aws/credentials"]}
2024-06-17T00:49:17Z D! {"caller":"awsutil@v0.98.0/conn.go:390","msg":"Using credential from session","kind":"receiver","name":"awscontainerinsightreceiver","data_type":"metrics","access-key":"XXXXXXXXXXXX","provider":"EnvConfigCredentials"}
2024-06-17T00:49:17Z I! {"caller":"host/ec2metadata.go:78","msg":"Fetch instance id and type from ec2 metadata","kind":"receiver","name":"awscontainerinsightreceiver","data_type":"metrics"}
2024-06-17T00:49:17Z I! {"caller":"service@v0.98.0/service.go:206","msg":"Starting shutdown..."}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x33b01ae]
goroutine 1 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awscontainerinsightreceiver/internal/stores.(*K8sDecorator).Shutdown(0x3cb01a0?)
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awscontainerinsightreceiver@v0.98.0/internal/stores/store.go:105 +0xe
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awscontainerinsightreceiver.(*awsContainerInsightReceiver).Shutdown(0xc000c0e340, {0xc000d78210?, 0xc000f12bc8?})
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awscontainerinsightreceiver@v0.98.0/receiver.go:341 +0x5bb
go.opentelemetry.io/collector/service/internal/graph.(*Graph).ShutdownAll(0xc000319ce0, {0x50ffcd0, 0x78b5600})
go.opentelemetry.io/collector/service@v0.98.0/internal/graph/graph.go:435 +0x1a8
go.opentelemetry.io/collector/service.(*Service).Shutdown(0xc000a2d320, {0x50ffcd0, 0x78b5600})
go.opentelemetry.io/collector/service@v0.98.0/service.go:212 +0xcf
go.opentelemetry.io/collector/otelcol.(*Collector).setupConfigurationComponents(0xc000962690, {0x50ffcd0, 0x78b5600})
go.opentelemetry.io/collector/otelcol@v0.98.0/collector.go:207 +0x76a
go.opentelemetry.io/collector/otelcol.(*Collector).Run(0xc000962690, {0x50ffcd0, 0x78b5600})
go.opentelemetry.io/collector/otelcol@v0.98.0/collector.go:249 +0x52
go.opentelemetry.io/collector/otelcol.NewCommand.func1(0xc000656c08, {0x4746cce?, 0x7?, 0x4741772?})
go.opentelemetry.io/collector/otelcol@v0.98.0/command.go:35 +0xa7
github.com/spf13/cobra.(*Command).execute(0xc000656c08, {0xc000697580, 0x1, 0x1})
github.com/spf13/cobra@v1.8.0/command.go:983 +0xaca
github.com/spf13/cobra.(*Command).ExecuteC(0xc000656c08)
github.com/spf13/cobra@v1.8.0/command.go:1115 +0x3ff
github.com/spf13/cobra.(*Command).Execute(0xc001086e10?)
github.com/spf13/cobra@v1.8.0/command.go:1039 +0x13
main.runAgent({0x50ffd40, 0xc000e820f0}, {0x78b5600, 0x0, 0x0}, {0x78b5600, 0x0, 0x0})
github.com/aws/amazon-cloudwatch-agent/cmd/amazon-cloudwatch-agent/amazon-cloudwatch-agent.go:358 +0x1012
main.reloadLoop(0xc0000f2120, {0x78b5600, 0x0, 0x0}, {0x78b5600, 0x0, 0x0}, {0xc000b0dde0, 0x0, 0x0}, ...)
github.com/aws/amazon-cloudwatch-agent/cmd/amazon-cloudwatch-agent/amazon-cloudwatch-agent.go:178 +0x347
main.main()
github.com/aws/amazon-cloudwatch-agent/cmd/amazon-cloudwatch-agent/amazon-cloudwatch-agent.go:605 +0xa5c
While same cloudwatch agent setup (K8E mode) works as expected when configured to collect cpu, disk and memory metrics. It also works as expected when configured to collect prometheus metrics, but it shuts down when configured to collect kubernetes container insights.
Details about my setup -
I am using MicroK8s
I am passing aws credentials as environment variables
I have IMDSV2 enabled and HttpPutResponseHopLimit set to 2 (I was able to access IMDS from inside sample pod that I created to test it)
I am using containerd runtime and have configured necessary volumes for daemonset
Same setup runs as expected on my local (non ec2) environment where cwagent runs in on-premise mode and collects container insights as expected.
Steps to reproduce
Run cwagent as daemonset on aws EC2 MicroK8s/K8s using following template
Additional context
I am able run same setup on my local (non ec2) environment where cwagent runs in on-premise K8OP mode and collects container insights as expected.
Describe the bug
Cloudwatch agent container insights crashes in K8s EC2
K8E
mode but works as expected in K8s on-premiseK8OP
mode.I am running cloudwatch agent as daemonset to collect container insight metrics in MicroK8s setup on AWS EC2 instance. When I configure cwagent to collect kubernetes container insights it starts and shuts down immediately with error
While same cloudwatch agent setup (
K8E
mode) works as expected when configured to collect cpu, disk and memory metrics. It also works as expected when configured to collect prometheus metrics, but it shuts down when configured to collect kubernetes container insights.Details about my setup -
IMDSV2
enabled andHttpPutResponseHopLimit
set to2
(I was able to access IMDS from inside sample pod that I created to test it)containerd
runtime and have configured necessary volumes for daemonsetSame setup runs as expected on my local (non ec2) environment where cwagent runs in on-premise mode and collects container insights as expected.
Steps to reproduce Run cwagent as daemonset on aws EC2 MicroK8s/K8s using following template
What did you expect to see? Cloudwatch agent to run and collect container insights metrics and push it to cloudwatch logs
What did you see instead? It crashed without proper error logs before starting to collect the metrics
What version did you use? Version:
1.300036.0b573
and1.300039.0b612
What config did you use? Config:
Environment OS: Ubuntu 22.04.4
Additional context I am able run same setup on my local (non ec2) environment where cwagent runs in on-premise
K8OP
mode and collects container insights as expected.