aws-samples / amazon-cloudwatch-container-insights

CloudWatch Agent Dockerfile and K8s YAML templates for CloudWatch Container Insights.
MIT No Attribution
163 stars 107 forks source link

AWS cloud agent deployment failure on ROSA #84

Open CherryJia opened 2 years ago

CherryJia commented 2 years ago

Hi team, I have a Rosa cluster deployed in region us-east-2 by following https://console.redhat.com/openshift/create/rosa/welcome

I am trying to setup the cloudwatch agent to collect the metrics by following. https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-EKS-quickstart.html

Before installation We attached CloudWatchAgentServerPolicy to each EC2 worker nodes.

But during daemonset deployment we got error:

 [telegraf] Error running agent: could not initialize processor ec2tagger: ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance` 

detail log:

`2021/11/19 10:04:19 I! 2021/11/19 10:04:16 E! ec2metadata is not available
2021/11/19 10:04:16 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
2021/11/19 10:04:17 W! retry [0/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021/11/19 10:04:18 W! retry [1/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021/11/19 10:04:19 W! retry [2/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021/11/19 10:04:19 I! access ECS task metadata fail with response unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers), assuming I'm not running in ECS.
I! Detected the instance is OnPrem
2021/11/19 10:04:19 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
2021/11/19 10:04:19 Reading json config file path: /etc/cwagentconfig/..2021_11_19_09_40_24.899664046/cwagentconfig.json ...
2021/11/19 10:04:19 Find symbolic link /etc/cwagentconfig/..data
2021/11/19 10:04:19 Find symbolic link /etc/cwagentconfig/cwagentconfig.json
2021/11/19 10:04:19 Reading json config file path: /etc/cwagentconfig/cwagentconfig.json ...
Valid Json input schema.
Got Home directory: /root
No csm configuration found.
No metric configuration found.
Configuration validation first phase succeeded

2021/11/19 10:04:19 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2021-11-19T10:04:19Z I! Starting AmazonCloudWatchAgent 1.247348.0
2021-11-19T10:04:19Z I! Loaded inputs: cadvisor k8sapiserver
2021-11-19T10:04:19Z I! Loaded aggregators:
2021-11-19T10:04:19Z I! Loaded processors: ec2tagger k8sdecorator
2021-11-19T10:04:19Z I! Loaded outputs: cloudwatchlogs
2021-11-19T10:04:19Z I! Tags enabled:
2021-11-19T10:04:19Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-10-0-137-118.us-east-2.compute.internal", Flush Interval:1s
2021-11-19T10:04:19Z I! [logagent] starting
2021-11-19T10:04:19Z I! [logagent] found plugin cloudwatchlogs is a log backend
2021-11-19T10:04:28Z E! [processors.ec2tagger] ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance
2021-11-19T10:04:28Z E! [telegraf] Error running agent: could not initialize processor ec2tagger: ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance

Here is my cluster info:

yuri9doo86@loaclhost aws % ..//rosa describe cluster -c cherryrosatest
Name:                       cherryrosatest
ID:                         1ogfk2f7f49ah56deg0neq1qks1cgbs2
External ID:                b136164e-766b-4ab0-a71e-7b4415ed4663
OpenShift Version:          4.9.5
Channel Group:              stable
DNS:                        cherryrosatest.3y19.p1.openshiftapps.com
AWS Account:               XXXXXX
API URL:                    https://api.cherryrosatest.3y19.p1.openshiftapps.com:6443
Console URL:                https://console-openshift-console.apps.cherryrosatest.3y19.p1.openshiftapps.com
Region:                     us-east-2
Multi-AZ:                   false
Nodes:
 - Control plane:           3
 - Infra:                   2
 - Compute (Autoscaled):    4-10
Network:
 - Service CIDR:            172.30.0.0/16
 - Machine CIDR:            10.0.0.0/16
 - Pod CIDR:                10.128.0.0/14
 - Host Prefix:             /23
STS Role ARN:               arn:aws:iam::675801125365:role/ManagedOpenShift-Installer-Role
Support Role ARN:           arn:aws:iam::675801125365:role/ManagedOpenShift-Support-Role
Instance IAM Roles:
 - Control plane:           arn:aws:iam::675801125365:role/ManagedOpenShift-ControlPlane-Role
 - Worker:                  arn:aws:iam::675801125365:role/ManagedOpenShift-Worker-Role
Operator IAM Roles:
 - arn:aws:iam::675801125365:role/cherryrosatest-s9a5-openshift-machine-api-aws-cloud-credentials
 - arn:aws:iam::675801125365:role/cherryrosatest-s9a5-openshift-cloud-credential-operator-cloud-cr
 - arn:aws:iam::675801125365:role/cherryrosatest-s9a5-openshift-image-registry-installer-cloud-cre
 - arn:aws:iam::675801125365:role/cherryrosatest-s9a5-openshift-ingress-operator-cloud-credentials
 - arn:aws:iam::675801125365:role/cherryrosatest-s9a5-openshift-cluster-csi-drivers-ebs-cloud-cred
State:                      ready
Private:                    No
Created:                    Nov 16 2021 05:56:28 UTC
Details Page:               https://console.redhat.com/openshift/details/s/20zKtEuEok2WNe2NVsKQwtEF7Fh
OIDC Endpoint URL:          https://rh-oidc.s3.us-east-1.amazonaws.com/1ogfk2f7f49ah56deg0neq1qks1cgbs2
pingleig commented 2 years ago

The agent is calling EC2 instance metadata (IMDS) https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html it might be disabled by your cluster setup.