aws-samples / amazon-cloudwatch-container-insights

CloudWatch Agent Dockerfile and K8s YAML templates for CloudWatch Container Insights.
MIT No Attribution
162 stars 106 forks source link

add yaml files for quick-start guide for enhanced container insights in EKS #163

Closed movence closed 4 months ago

movence commented 4 months ago

Description of the issue

The current version of quick-start for Container Insights is outdated and doesn't support GPU monitoring.

Description of changes

Update quick-start yaml files to use the CloudWatch Agent Operator.

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

Using a test EKS cluster with 1 Nvidia GPU instance and 1 non-gpu (m5) instance.

# create custom resource definitions
> cat /tmp/other/amazon-cloudwatch-container-insights/k8s-quickstart/cwagent-custom-resource-definitions.yaml | sed 's/{{cluster_name}}/'doc-test'/g;s/{{region_name}}/'us-west-2'/g' | kubectl apply --server-side -f -
customresourcedefinition.apiextensions.k8s.io/amazoncloudwatchagents.cloudwatch.aws.amazon.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/dcgmexporters.cloudwatch.aws.amazon.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/instrumentations.cloudwatch.aws.amazon.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/neuronmonitors.cloudwatch.aws.amazon.com serverside-applied

# create resources including daemonsets
> cat /tmp/amazon-cloudwatch-container-insights/k8s-quickstart/cwagent-operator-rendered.yaml | sed 's/{{cluster_name}}/'doc-test'/g;s/{{region_name}}/'us-west-2'/g' | kubectl apply -f -
namespace/amazon-cloudwatch created
serviceaccount/cloudwatch-agent created
serviceaccount/amazon-cloudwatch-observability-controller-manager created
secret/amazon-cloudwatch-observability-agent-cert created
configmap/fluent-bit-config created
configmap/fluent-bit-windows-config created
clusterrole.rbac.authorization.k8s.io/cloudwatch-agent-role created
clusterrole.rbac.authorization.k8s.io/amazon-cloudwatch-observability-manager-role created
clusterrolebinding.rbac.authorization.k8s.io/cloudwatch-agent-role-binding created
clusterrolebinding.rbac.authorization.k8s.io/amazon-cloudwatch-observability-manager-rolebinding created
role.rbac.authorization.k8s.io/dcgm-exporter-role created
role.rbac.authorization.k8s.io/neuron-monitor-role created
rolebinding.rbac.authorization.k8s.io/dcgm-exporter-role-binding created
rolebinding.rbac.authorization.k8s.io/neuron-monitor-role-binding created
service/amazon-cloudwatch-observability-webhook-service created
daemonset.apps/fluent-bit created
daemonset.apps/fluent-bit-windows created
deployment.apps/amazon-cloudwatch-observability-controller-manager created
amazoncloudwatchagent.cloudwatch.aws.amazon.com/cloudwatch-agent created
amazoncloudwatchagent.cloudwatch.aws.amazon.com/cloudwatch-agent-windows created
certificate.cert-manager.io/amazon-cloudwatch-observability-serving-cert created
dcgmexporter.cloudwatch.aws.amazon.com/dcgm-exporter created
issuer.cert-manager.io/amazon-cloudwatch-observability-selfsigned-issuer created
mutatingwebhookconfiguration.admissionregistration.k8s.io/amazon-cloudwatch-observability-mutating-webhook-configuration created
neuronmonitor.cloudwatch.aws.amazon.com/neuron-monitor created
validatingwebhookconfiguration.admissionregistration.k8s.io/amazon-cloudwatch-observability-validating-webhook-configuration created
kubectl get pods -n amazon-cloudwatch
NAME                                                              READY   STATUS    RESTARTS   AGE
amazon-cloudwatch-observability-controller-manager-5fcbbdfp85dm   1/1     Running   0          22s
cloudwatch-agent-24lfd                                            1/1     Running   0          15s
cloudwatch-agent-flmmg                                            1/1     Running   0          15s
dcgm-exporter-6fw7m                                               1/1     Running   0          15s
fluent-bit-qr5jj                                                  1/1     Running   0          23s
fluent-bit-vw4zg                                                  1/1     Running   0          23s

Requirements

Before committing the code, please verify the following: