The current version of quick-start for Container Insights is outdated and doesn't support GPU monitoring.
Description of changes
Update quick-start yaml files to use the CloudWatch Agent Operator.
License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
Using a test EKS cluster with 1 Nvidia GPU instance and 1 non-gpu (m5) instance.
# create custom resource definitions
> cat /tmp/other/amazon-cloudwatch-container-insights/k8s-quickstart/cwagent-custom-resource-definitions.yaml | sed 's/{{cluster_name}}/'doc-test'/g;s/{{region_name}}/'us-west-2'/g' | kubectl apply --server-side -f -
customresourcedefinition.apiextensions.k8s.io/amazoncloudwatchagents.cloudwatch.aws.amazon.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/dcgmexporters.cloudwatch.aws.amazon.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/instrumentations.cloudwatch.aws.amazon.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/neuronmonitors.cloudwatch.aws.amazon.com serverside-applied
# create resources including daemonsets
> cat /tmp/amazon-cloudwatch-container-insights/k8s-quickstart/cwagent-operator-rendered.yaml | sed 's/{{cluster_name}}/'doc-test'/g;s/{{region_name}}/'us-west-2'/g' | kubectl apply -f -
namespace/amazon-cloudwatch created
serviceaccount/cloudwatch-agent created
serviceaccount/amazon-cloudwatch-observability-controller-manager created
secret/amazon-cloudwatch-observability-agent-cert created
configmap/fluent-bit-config created
configmap/fluent-bit-windows-config created
clusterrole.rbac.authorization.k8s.io/cloudwatch-agent-role created
clusterrole.rbac.authorization.k8s.io/amazon-cloudwatch-observability-manager-role created
clusterrolebinding.rbac.authorization.k8s.io/cloudwatch-agent-role-binding created
clusterrolebinding.rbac.authorization.k8s.io/amazon-cloudwatch-observability-manager-rolebinding created
role.rbac.authorization.k8s.io/dcgm-exporter-role created
role.rbac.authorization.k8s.io/neuron-monitor-role created
rolebinding.rbac.authorization.k8s.io/dcgm-exporter-role-binding created
rolebinding.rbac.authorization.k8s.io/neuron-monitor-role-binding created
service/amazon-cloudwatch-observability-webhook-service created
daemonset.apps/fluent-bit created
daemonset.apps/fluent-bit-windows created
deployment.apps/amazon-cloudwatch-observability-controller-manager created
amazoncloudwatchagent.cloudwatch.aws.amazon.com/cloudwatch-agent created
amazoncloudwatchagent.cloudwatch.aws.amazon.com/cloudwatch-agent-windows created
certificate.cert-manager.io/amazon-cloudwatch-observability-serving-cert created
dcgmexporter.cloudwatch.aws.amazon.com/dcgm-exporter created
issuer.cert-manager.io/amazon-cloudwatch-observability-selfsigned-issuer created
mutatingwebhookconfiguration.admissionregistration.k8s.io/amazon-cloudwatch-observability-mutating-webhook-configuration created
neuronmonitor.cloudwatch.aws.amazon.com/neuron-monitor created
validatingwebhookconfiguration.admissionregistration.k8s.io/amazon-cloudwatch-observability-validating-webhook-configuration created
kubectl get pods -n amazon-cloudwatch
NAME READY STATUS RESTARTS AGE
amazon-cloudwatch-observability-controller-manager-5fcbbdfp85dm 1/1 Running 0 22s
cloudwatch-agent-24lfd 1/1 Running 0 15s
cloudwatch-agent-flmmg 1/1 Running 0 15s
dcgm-exporter-6fw7m 1/1 Running 0 15s
fluent-bit-qr5jj 1/1 Running 0 23s
fluent-bit-vw4zg 1/1 Running 0 23s
Requirements
Before committing the code, please verify the following:
If this commit includes changes to existing sample configurations, you acknowledge that you have confirmed this will not impact existing customer behavior.
If not necessary, consider creating a new sample configuration for this change.
Description of the issue
The current version of quick-start for Container Insights is outdated and doesn't support GPU monitoring.
Description of changes
Update quick-start yaml files to use the CloudWatch Agent Operator.
License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
Using a test EKS cluster with 1 Nvidia GPU instance and 1 non-gpu (m5) instance.
Requirements
Before committing the code, please verify the following: