Closed youwalther65 closed 2 years ago
Will check and get back to you soon.
If you were using IRSA then the region field should have been auto- injected. I am not sure why you don't see it in your deployment spec for cni-metrics-helper. Will check your cluster setup. Could you share your cluster-arn to k8s-awscni-triage@amazon.com Meanwhile
- name: AWS_REGION
value: <your region>
Thanks
I followed installation instructions from: https://docs.aws.amazon.com/eks/latest/userguide/cni-metrics-helper.html
This points to the following YAML for SA, RBAC and deployment: $ curl -o cni-metrics-helper.yaml https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/release-1.10/config/master/cni-metrics-helper.yaml
kind: Deployment apiVersion: apps/v1 metadata: name: cni-metrics-helper namespace: kube-system labels: k8s-app: cni-metrics-helper spec: selector: matchLabels: k8s-app: cni-metrics-helper template: metadata: labels: k8s-app: cni-metrics-helper spec: containers:
**- name: AWS_CLUSTER_ID
value: ""**
- name: USE_CLOUDWATCH
value: "true"
# Optional: Should be ClusterName/ClusterIdentifier used as the metric dimension
**- name: AWS_CLUSTER_ID
value: ""**
name: cni-metrics-helper
image: "602401143452.dkr.ecr.us-west-2.amazonaws.com/cni-metrics-helper:v1.10.2"
serviceAccountName: cni-metrics-helper
I will substiute here AWS_REGION for the second AWS_CLUSTER_ID and check
Could you check this Readme : https://github.com/aws/amazon-vpc-cni-k8s/tree/master/cmd/cni-metrics-helper Also share your cluster-arn to: k8s-awscni-triage@amazon.com . This will help me inspect your deployment spec and service accounts. Thanks
Same HTTP 503 messages but I can confirm that region is used in deployment now. $ k logs -n kube-system cni-metrics-helper-75cb84c9f8-r2wgn {"level":"info","ts":"2022-03-08T07:15:30.793Z","caller":"runtime/proc.go:255","msg":"Starting CNIMetricsHelper. Sending metrics to CloudWatch: true, LogLevel Debug"} I0308 07:15:31.850529 1 request.go:621] Throttling request took 1.037923334s, request: GET:https://172.20.0.1:443/apis/kustomize.toolkit.fluxcd.io/v1beta1?timeout=32s {"level":"info","ts":"2022-03-08T07:15:38.815Z","caller":"cni-metrics-helper/main.go:113","msg":"Using REGION=eu-west-1 and CLUSTER_ID=git-eks-demo-ipv4"} {"level":"info","ts":"2022-03-08T07:16:08.816Z","caller":"runtime/proc.go:255","msg":"Collecting metrics ..."} {"level":"info","ts":"2022-03-08T07:16:08.916Z","caller":"metrics/cni_metrics.go:195","msg":"Total aws-node pod count:- %!(EXTRA int=4)"} {"level":"debug","ts":"2022-03-08T07:16:08.922Z","caller":"metrics/metrics.go:382","msg":"cni-metrics text output: # HELP awscni_add_ip_req_count The number of add IP address requests\n# TYPE awscni_add_ip_req_count counter\nawscni_add_ip_req_count 0\n# HELP awscni_assigned_ip_addresses The number of IP addresses assigned to pods\n# TYPE awscni_assigned_ip_addresses gauge\nawscni_assigned_ip_addresses 0\n# HELP awscni_aws_api_latency_ms AWS API call latency in ms\n# TYPE awscni_aws_api_latency_ms summary\nawscni_aws_api_latency_ms_sum{api=\"DescribeNetworkInterfaces\",error=\"false\",status=\"200\"} 278\nawscni_aws_api_latency_ms_count{api=\"DescribeNetworkInterfaces\",error=\"false\",status=\"200\"} 1\nawscni_aws_api_latency_ms_sum{api=\"GetMetadata\",error=\"false\",status=\"200\"} 1789\nawscni_aws_api_latency_ms_count{api=\"GetMetadata\",error=\"false\",status=\"200\"} 10683\nawscni_aws_api_latency_ms_sum{api=\"GetMetadata\",error=\"true\",status=\"404\"} 166\nawscni_aws_api_latency_ms_count{api=\"GetMetadata\",error=\"true\",status=\"404\"} 1068\nawscni_aws_api_latency_ms_sum{api=\"ModifyNetworkInterfaceAttribute\",error=\"false\",status=\"200\"} 380\nawscni_aws_api_latency_ms_count{api=\"ModifyNetworkInterfaceAttribute\",error=\"false\",status=\"200\"} 1\n# HELP awscni_build_info A metric with a constant '1' value labeled by version, revision, and goversion from which amazon-vpc-cni-k8s was built.\n# TYPE awscni_build_info gauge\nawscni_build_info{goversion=\"go1.16.10\",version=\"\"} 1\n# HELP awscni_eni_allocated The number of ENIs allocated\n# TYPE awscni_eni_allocated gauge\nawscni_eni_allocated 1\n# HELP awscni_eni_max The maximum number of ENIs that can be attached to the instance, accounting for unmanaged ENIs\n# TYPE awscni_eni_max gauge\nawscni_eni_max 3\n# HELP awscni_force_removed_enis The number of ENIs force removed while they had assigned pods\n# TYPE awscni_force_removed_enis counter\nawscni_force_removed_enis 0\n# HELP awscni_force_removed_ips The number of IPs force removed while they had assigned pods\n# TYPE awscni_force_removed_ips counter\nawscni_force_removed_ips 0\n# HELP awscni_ip_max The maximum number of IP addresses that can be allocated to the instance\n# TYPE awscni_ip_max gauge\nawscni_ip_max 15\n# HELP awscni_ipamd_action_inprogress The number of ipamd actions in progress\n# TYPE awscni_ipamd_action_inprogress gauge\nawscni_ipamd_action_inprogress{fn=\"nodeIPPoolReconcile\"} 0\nawscni_ipamd_action_inprogress{fn=\"nodeInit\"} 0\n# HELP awscni_reconcile_count The number of times ipamd reconciles on ENIs and IP/Prefix addresses\n# TYPE awscni_reconcile_count counter\nawscni_reconcile_count{fn=\"eniDataStorePoolReconcileAdd\"} 5330\n# HELP awscni_total_ip_addresses The total number of IP addresses\n# TYPE awscni_total_ip_addresses gauge\nawscni_total_ip_addresses 5\n# HELP awscni_total_ipv4_prefixes The total number of IPv4 prefixes\n# TYPE awscni_total_ipv4_prefixes gauge\nawscni_total_ipv4_prefixes 0\n# HELP go_gc_duration_seconds A summary of the GC invocation durations.\n# TYPE go_gc_duration_seconds summary\ngo_gc_duration_seconds{quantile=\"0\"} 4.0646e-05\ngo_gc_duration_seconds{quantile=\"0.25\"} 5.2208e-05\ngo_gc_duration_seconds{quantile=\"0.5\"} 7.3721e-05\ngo_gc_duration_seconds{quantile=\"0.75\"} 0.000102977\ngo_gc_duration_seconds{quantile=\"1\"} 0.001945569\ngo_gc_duration_seconds_sum 0.061231686\ngo_gc_duration_seconds_count 544\n# HELP go_goroutines Number of goroutines that currently exist.\n# TYPE go_goroutines gauge\ngo_goroutines 37\n# HELP go_info Information about the Go environment.\n# TYPE go_info gauge\ngo_info{version=\"go1.16.10\"} 1\n# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.\n# TYPE go_memstats_alloc_bytes gauge\ngo_memstats_alloc_bytes 4.73988e+06\n# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.\n# TYPE go_memstats_alloc_bytes_total counter\ngo_memstats_alloc_bytes_total 1.817476792e+09\n# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.\n# TYPE go_memstats_buck_hash_sys_bytes gauge\ngo_memstats_buck_hash_sys_bytes 1.54928e+06\n# HELP go_memstats_frees_total Total number of frees.\n# TYPE go_memstats_frees_total counter\ngo_memstats_frees_total 4.886522e+06\n# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.\n# TYPE go_memstats_gc_cpu_fraction gauge\ngo_memstats_gc_cpu_fraction 5.9174801745196725e-06\n# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.\n# TYPE go_memstats_gc_sys_bytes gauge\ngo_memstats_gc_sys_bytes 5.626544e+06\n# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.\n# TYPE go_memstats_heap_alloc_bytes gauge\ngo_memstats_heap_alloc_bytes 4.73988e+06\n# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.\n# TYPE go_memstats_heap_idle_bytes gauge\ngo_memstats_heap_idle_bytes 5.9006976e+07\n# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.\n# TYPE go_memstats_heap_inuse_bytes gauge\ngo_memstats_heap_inuse_bytes 7.479296e+06\n# HELP go_memstats_heap_objects Number of allocated objects.\n# TYPE go_memstats_heap_objects gauge\ngo_memstats_heap_objects 26010\n# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.\n# TYPE go_memstats_heap_released_bytes gauge\ngo_memstats_heap_released_bytes 5.636096e+07\n# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.\n# TYPE go_memstats_heap_sys_bytes gauge\ngo_memstats_heap_sys_bytes 6.6486272e+07\n# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.\n# TYPE go_memstats_last_gc_time_seconds gauge\ngo_memstats_last_gc_time_seconds 1.6467237672096214e+09\n# HELP go_memstats_lookups_total Total number of pointer lookups.\n# TYPE go_memstats_lookups_total counter\ngo_memstats_lookups_total 0\n# HELP go_memstats_mallocs_total Total number of mallocs.\n# TYPE go_memstats_mallocs_total counter\ngo_memstats_mallocs_total 4.912532e+06\n# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.\n# TYPE go_memstats_mcache_inuse_bytes gauge\ngo_memstats_mcache_inuse_bytes 2400\n# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.\n# TYPE go_memstats_mcache_sys_bytes gauge\ngo_memstats_mcache_sys_bytes 16384\n# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.\n# TYPE go_memstats_mspan_inuse_bytes gauge\ngo_memstats_mspan_inuse_bytes 119544\n# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.\n# TYPE go_memstats_mspan_sys_bytes gauge\ngo_memstats_mspan_sys_bytes 147456\n# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.\n# TYPE go_memstats_next_gc_bytes gauge\ngo_memstats_next_gc_bytes 9.173072e+06\n# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.\n# TYPE go_memstats_other_sys_bytes gauge\ngo_memstats_other_sys_bytes 607608\n# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.\n# TYPE go_memstats_stack_inuse_bytes gauge\ngo_memstats_stack_inuse_bytes 622592\n# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.\n# TYPE go_memstats_stack_sys_bytes gauge\ngo_memstats_stack_sys_bytes 622592\n# HELP go_memstats_sys_bytes Number of bytes obtained from system.\n# TYPE go_memstats_sys_bytes gauge\ngo_memstats_sys_bytes 7.5056136e+07\n# HELP go_threads Number of OS threads created.\n# TYPE go_threads gauge\ngo_threads 8\n# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.\n# TYPE process_cpu_seconds_total counter\nprocess_cpu_seconds_total 26.97\n# HELP process_max_fds Maximum number of open file descriptors.\n# TYPE process_max_fds gauge\nprocess_max_fds 1.048576e+06\n# HELP process_open_fds Number of open file descriptors.\n# TYPE process_open_fds gauge\nprocess_open_fds 20\n# HELP process_resident_memory_bytes Resident memory size in bytes.\n# TYPE process_resident_memory_bytes gauge\nprocess_resident_memory_bytes 5.7884672e+07\n# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.\n# TYPE process_start_time_seconds gauge\nprocess_start_time_seconds 1.64665979817e+09\n# HELP process_virtual_memory_bytes Virtual memory size in bytes.\n# TYPE process_virtual_memory_bytes gauge\nprocess_virtual_memory_bytes 7.78473472e+08\n# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.\n# TYPE process_virtual_memory_max_bytes gauge\nprocess_virtual_memory_max_bytes -1\n# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.\n# TYPE promhttp_metric_handler_requests_in_flight gauge\npromhttp_metric_handler_requests_in_flight 1\n# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.\n# TYPE promhttp_metric_handler_requests_total counter\npromhttp_metric_handler_requests_total{code=\"200\"} 116\npromhttp_metric_handler_requests_total{code=\"500\"} 0\npromhttp_metric_handler_requests_total{code=\"503\"} 0\n"} {"level":"debug","ts":"2022-03-08T07:16:08.923Z","caller":"metrics/metrics.go:261","msg":"Reset detected resetDetected: false, noPreviousDataPoint: true, noCurrentDataPoint: false"}
Interesting to note that now the AWS_DEFAULT_REGION env is not inserted.
spec: containers:
I will check the GitHub docu and send email in a minute.
Interesting to see why IRSA does not inject bot AWS_REGION and AWS_DEFAULT_REGION. Did it manually in delyment and now NTH works:
$ k logs -n kube-system aws-node-termination-handler-6f846dcb79-rm6hl
2022/03/07 15:43:17 INF Starting to serve handler /healthz, port 8080
2022/03/07 15:43:17 INF Startup Metadata Retrieved metadata={"accountId":"
2022/03/07 15:43:17 INF Started watching for interruption events 2022/03/07 15:43:17 INF Kubernetes AWS Node Termination Handler has started successfully! 2022/03/07 15:43:17 INF Started watching for event cancellations 2022/03/07 15:43:17 INF Started monitoring for events event_type=SQS_TERMINATE
Now I see metrics in CW.
Hi @youwalther65 I followed the steps mentioned here: https://docs.aws.amazon.com/eks/latest/userguide/cni-metrics-helper.html . I tried in ap-east-1 region and was able to find cni-metrics-helper pod injected with AWS_REGION as well as AWS_DEFAULT_REGION fields. I created serviceaccount using eksctl (just fyi). I am not sure what went wrong in your case but you can give it a one more try with a fresh install. I also verified the metrics being published.
Note: there was duplicate AWS_CLUSTER_ID field in the manifest file. I am not sure if that could have anyway affected but fixing it.
AWS_CLUSTER_ID: test
USE_CLOUDWATCH: true
AWS_DEFAULT_REGION: ap-east-1
AWS_REGION: ap-east-1
Will wait for your response before we can close the issue.
It woks now but didn't find a real root cause
Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
Hi I am facing this same issue right now, added the AWS_REGION AND AWS_DEFAULT_REGION manually but still see this in the cni-helper pod :
{"level":"info","ts":"2022-09-21T01:45:08.596Z","caller":"metrics/cni_metrics.go:195","msg":"Total aws-node pod count:- %!(EXTRA int=6)"}
{"level":"error","ts":"2022-09-21T01:47:18.033Z","caller":"metrics/metrics.go:382","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the request (get pods aws-node-sx47t:61678)"}
{"level":"error","ts":"2022-09-21T01:49:29.105Z","caller":"metrics/metrics.go:382","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the request (get pods aws-node-gdtcw:61678)"}
{"level":"error","ts":"2022-09-21T01:51:40.177Z","caller":"metrics/metrics.go:382","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the request (get pods aws-node-xbc9b:61678)"}
{"level":"error","ts":"2022-09-21T01:53:51.249Z","caller":"metrics/metrics.go:382","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the request (get pods aws-node-ktpbn:61678)"}```
Any help greatly appreciated.
Hi @hiteshghia , are you still facing this issue?
If this project is still around, I seem to be having the same issue.
{"level":"info","ts":"2024-10-14T20:06:11.547Z","caller":"cni-metrics-helper/main.go:69","msg":"Constructed new logger instance"}
{"level":"info","ts":"2024-10-14T20:06:11.548Z","caller":"runtime/proc.go:271","msg":"Starting CNIMetricsHelper. Sending metrics to CloudWatch: false, Prometheus: true, LogLevel DEBUG, me
tricUpdateInterval 30"}
{"level":"info","ts":"2024-10-14T20:06:41.588Z","caller":"runtime/proc.go:271","msg":"Collecting metrics ..."}
{"level":"info","ts":"2024-10-14T20:06:41.689Z","caller":"metrics/cni_metrics.go:211","msg":"Total aws-node pod count: 5"}
{"level":"debug","ts":"2024-10-14T20:06:41.689Z","caller":"metrics/metrics.go:439","msg":"Total TargetList pod count: 5"}
{"level":"error","ts":"2024-10-14T20:08:51.287Z","caller":"metrics/metrics.go:399","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the
request (get pods aws-node-n929t:61678)"}
{"level":"error","ts":"2024-10-14T20:11:02.359Z","caller":"metrics/metrics.go:399","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the
request (get pods aws-node-xlz6m:61678)"}
installed cni-helper via helm chart, with the intent to scrape via prometheus
What happened: CNI helper pod is running but not able to scape metrics from aws-node pods $ k get clusterrole cni-metrics-helper NAME CREATED AT cni-metrics-helper 2022-03-07T18:37:14Z
$ k get clusterrolebinding cni-metrics-helper NAME ROLE AGE cni-metrics-helper ClusterRole/cni-metrics-helper 23m
$ k get deploy -n kube-system cni-metrics-helper NAME READY UP-TO-DATE AVAILABLE AGE cni-metrics-helper 1/1 1 1 21m
$ k get deploy -n kube-system cni-metrics-helper -o yaml apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" creationTimestamp: "2022-03-07T18:37:14Z" generation: 1 labels: k8s-app: cni-metrics-helper kustomize.toolkit.fluxcd.io/name: flux-infrastructure kustomize.toolkit.fluxcd.io/namespace: flux-system name: cni-metrics-helper namespace: kube-system resourceVersion: "119599" uid: 1589a869-2cbc-439d-9b8d-6f7d9ee693f8 spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: k8s-app: cni-metrics-helper strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: creationTimestamp: null labels: k8s-app: cni-metrics-helper spec: containers:
Attach logs $ k logs -n kube-system cni-metrics-helper-5dff487d97-q2n6d ... {"level":"debug","ts":"2022-03-07T18:38:01.245Z","caller":"metrics/metrics.go:261","msg":"Reset detected resetDetected: false, noPreviousDataPoint: true, noCurrentDataPoint: false"} {"level":"error","ts":"2022-03-07T18:40:11.293Z","caller":"metrics/metrics.go:382","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the request (get pods aws-node-ns559:61678)"} {"level":"error","ts":"2022-03-07T18:42:22.365Z","caller":"metrics/metrics.go:382","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the request (get pods aws-node-w9qnr:61678)"} {"level":"error","ts":"2022-03-07T18:44:33.437Z","caller":"metrics/metrics.go:382","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the request (get pods aws-node-jqq92:61678)"} {"level":"info","ts":"2022-03-07T18:44:33.437Z","caller":"runtime/proc.go:255","msg":"Collecting metrics ..."} {"level":"info","ts":"2022-03-07T18:44:33.437Z","caller":"metrics/cni_metrics.go:195","msg":"Total aws-node pod count:- %!(EXTRA int=4)"} {"level":"error","ts":"2022-03-07T18:46:44.508Z","caller":"metrics/metrics.go:382","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the request (get pods aws-node-ns559:61678)"} {"level":"debug","ts":"2022-03-07T18:46:44.519Z","caller":"metrics/metrics.go:382","msg":"cni-metrics text output: # HELP awscni_add_ip_req_count The number of add IP address requests\n# TYPE awscni_add_ip_req_count counter\nawscni_add_ip_req_count 0\n# HELP awscni_assigned_ip_addresses The number of IP addresses assigned to pods\n# TYPE awscni_assigned_ip_addresses gauge\nawscni_assigned_ip_addresses 0\n# HELP awscni_aws_api_latency_ms AWS API call latency in ms\n# TYPE awscni_aws_api_latency_ms summary\nawscni_aws_api_latency_ms_sum{api=\"DescribeNetworkInterfaces\",error=\"false\",status=\"200\"} 278\nawscni_aws_api_latency_ms_count{api=\"DescribeNetworkInterfaces\",error=\"false\",status=\"200\"} 1\nawscni_aws_api_latency_ms_sum{api=\"GetMetadata\",error=\"false\",status=\"200\"} 640\nawscni_aws_api_latency_ms_count{api=\"GetMetadata\",error=\"false\",status=\"200\"} 3191\nawscni_aws_api_latency_ms_sum{api=\"GetMetadata\",error=\"true\",status=\"404\"} 53\nawscni_aws_api_latency_ms_count{api=\"GetMetadata\",error=\"true\",status=\"404\"} 319\nawscni_aws_api_latency_ms_sum{api=\"ModifyNetworkInterfaceAttribute\",error=\"false\",status=\"200\"} 380\nawscni_aws_api_latency_ms_count{api=\"ModifyNetworkInterfaceAttribute\",error=\"false\",status=\"200\"} 1\n# HELP awscni_build_info A metric with a constant '1' value labeled by version, revision, and goversion from which amazon-vpc-cni-k8s was built.\n# TYPE awscni_build_info gauge\nawscni_build_info{goversion=\"go1.16.10\",version=\"\"} 1\n# HELP awscni_eni_allocated The number of ENIs allocated\n# TYPE awscni_eni_allocated gauge\nawscni_eni_allocated 1\n# HELP awscni_eni_max The maximum number of ENIs that can be attached to the instance, accounting for unmanaged ENIs\n# TYPE awscni_eni_max gauge\nawscni_eni_max 3\n# HELP awscni_force_removed_enis The number of ENIs force removed while they had assigned pods\n# TYPE awscni_force_removed_enis counter\nawscni_force_removed_enis 0\n# HELP awscni_force_removed_ips The number of IPs force removed while they had assigned pods\n# TYPE awscni_force_removed_ips counter\nawscni_force_removed_ips 0\n# HELP awscni_ip_max The maximum number of IP addresses that can be allocated to the instance\n# TYPE awscni_ip_max gauge\nawscni_ip_max 15\n# HELP awscni_ipamd_action_inprogress The number of ipamd actions in progress\n# TYPE awscni_ipamd_action_inprogress gauge\nawscni_ipamd_action_inprogress{fn=\"nodeIPPoolReconcile\"} 0\nawscni_ipamd_action_inprogress{fn=\"nodeInit\"} 0\n# HELP awscni_reconcile_count The number of times ipamd reconciles on ENIs and IP/Prefix addresses\n# TYPE awscni_reconcile_count counter\nawscni_reconcile_count{fn=\"eniDataStorePoolReconcileAdd\"} 1585\n# HELP awscni_total_ip_addresses The total number of IP addresses\n# TYPE awscni_total_ip_addresses gauge\nawscni_total_ip_addresses 5\n# HELP awscni_total_ipv4_prefixes The total number of IPv4 prefixes\n# TYPE awscni_total_ipv4_prefixes gauge\nawscni_total_ipv4_prefixes 0\n# HELP go_gc_duration_seconds A summary of the GC invocation durations.\n# TYPE go_gc_duration_seconds summary\ngo_gc_duration_seconds{quantile=\"0\"} 3.2051e-05\ngo_gc_duration_seconds{quantile=\"0.25\"} 4.746e-05\ngo_gc_duration_seconds{quantile=\"0.5\"} 5.3798e-05\ngo_gc_duration_seconds{quantile=\"0.75\"} 7.3225e-05\ngo_gc_duration_seconds{quantile=\"1\"} 0.001240274\ngo_gc_duration_seconds_sum 0.011943986\ngo_gc_duration_seconds_count 163\n# HELP go_goroutines Number of goroutines that currently exist.\n# TYPE go_goroutines gauge\ngo_goroutines 37\n# HELP go_info Information about the Go environment.\n# TYPE go_info gauge\ngo_info{version=\"go1.16.10\"} 1\n# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.\n# TYPE go_memstats_alloc_bytes gauge\ngo_memstats_alloc_bytes 5.749584e+06\n# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.\n# TYPE go_memstats_alloc_bytes_total counter\ngo_memstats_alloc_bytes_total 5.1863536e+08\n# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.\n# TYPE go_memstats_buck_hash_sys_bytes gauge\ngo_memstats_buck_hash_sys_bytes 1.490576e+06\n# HELP go_memstats_frees_total Total number of frees.\n# TYPE go_memstats_frees_total counter\ngo_memstats_frees_total 1.529438e+06\n# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.\n# TYPE go_memstats_gc_cpu_fraction gauge\ngo_memstats_gc_cpu_fraction 2.9108407286280238e-06\n# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.\n# TYPE go_memstats_gc_sys_bytes gauge\ngo_memstats_gc_sys_bytes 5.616304e+06\n# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.\n# TYPE go_memstats_heap_alloc_bytes gauge\ngo_memstats_heap_alloc_bytes 5.749584e+06\n# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.\n# TYPE go_memstats_heap_idle_bytes gauge\ngo_memstats_heap_idle_bytes 5.8212352e+07\n# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.\n# TYPE go_memstats_heap_inuse_bytes gauge\ngo_memstats_heap_inuse_bytes 8.208384e+06\n# HELP go_memstats_heap_objects Number of allocated objects.\n# TYPE go_memstats_heap_objects gauge\ngo_memstats_heap_objects 29446\n# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.\n# TYPE go_memstats_heap_released_bytes gauge\ngo_memstats_heap_released_bytes 5.5681024e+07\n# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.\n# TYPE go_memstats_heap_sys_bytes gauge\ngo_memstats_heap_sys_bytes 6.6420736e+07\n# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.\n# TYPE go_memstats_last_gc_time_seconds gauge\ngo_memstats_last_gc_time_seconds 1.6466787616502016e+09\n# HELP go_memstats_lookups_total Total number of pointer lookups.\n# TYPE go_memstats_lookups_total counter\ngo_memstats_lookups_total 0\n# HELP go_memstats_mallocs_total Total number of mallocs.\n# TYPE go_memstats_mallocs_total counter\ngo_memstats_mallocs_total 1.558884e+06\n# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.\n# TYPE go_memstats_mcache_inuse_bytes gauge\ngo_memstats_mcache_inuse_bytes 2400\n# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.\n# TYPE go_memstats_mcache_sys_bytes gauge\ngo_memstats_mcache_sys_bytes 16384\n# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.\n# TYPE go_memstats_mspan_inuse_bytes gauge\ngo_memstats_mspan_inuse_bytes 119952\n# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.\n# TYPE go_memstats_mspan_sys_bytes gauge\ngo_memstats_mspan_sys_bytes 147456\n# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.\n# TYPE go_memstats_next_gc_bytes gauge\ngo_memstats_next_gc_bytes 8.87776e+06\n# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.\n# TYPE go_memstats_other_sys_bytes gauge\ngo_memstats_other_sys_bytes 676552\n# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.\n# TYPE go_memstats_stack_inuse_bytes gauge\ngo_memstats_stack_inuse_bytes 688128\n# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.\n# TYPE go_memstats_stack_sys_bytes gauge\ngo_memstats_stack_sys_bytes 688128\n# HELP go_memstats_sys_bytes Number of bytes obtained from system.\n# TYPE go_memstats_sys_bytes gauge\ngo_memstats_sys_bytes 7.5056136e+07\n# HELP go_threads Number of OS threads created.\n# TYPE go_threads gauge\ngo_threads 8\n# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.\n# TYPE process_cpu_seconds_total counter\nprocess_cpu_seconds_total 7.82\n# HELP process_max_fds Maximum number of open file descriptors.\n# TYPE process_max_fds gauge\nprocess_max_fds 1.048576e+06\n# HELP process_open_fds Number of open file descriptors.\n# TYPE process_open_fds gauge\nprocess_open_fds 20\n# HELP process_resident_memory_bytes Resident memory size in bytes.\n# TYPE process_resident_memory_bytes gauge\nprocess_resident_memory_bytes 5.7962496e+07\n# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.\n# TYPE process_start_time_seconds gauge\nprocess_start_time_seconds 1.64665979817e+09\n# HELP process_virtual_memory_bytes Virtual memory size in bytes.\n# TYPE process_virtual_memory_bytes gauge\nprocess_virtual_memory_bytes 7.78473472e+08\n# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.\n# TYPE process_virtual_memory_max_bytes gauge\nprocess_virtual_memory_max_bytes -1\n# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.\n# TYPE promhttp_metric_handler_requests_in_flight gauge\npromhttp_metric_handler_requests_in_flight 1\n# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.\n# TYPE promhttp_metric_handler_requests_total counter\npromhttp_metric_handler_requests_total{code=\"200\"} 1\npromhttp_metric_handler_requests_total{code=\"500\"} 0\npromhttp_metric_handler_requests_total{code=\"503\"} 0\n"}
ServiceAccount is using IRSA $ k get sa -n kube-system cni-metrics-helper -o yaml | head -6 apiVersion: v1 kind: ServiceAccount metadata: annotations: eks.amazonaws.com/role-arn: arn:aws:iam:::role/AmazonEKSVPCCNIMetricsHelperRole-git-eks-demo-ipv4
creationTimestamp: "2022-03-07T18:37:14Z"
$ aws iam get-role --role-name AmazonEKSVPCCNIMe tricsHelperRole-git-eks-demo-ipv4 { "Role": { "Path": "/", "RoleName": "AmazonEKSVPCCNIMetricsHelperRole-git-eks-demo-ipv4", "RoleId": "AROAZAC4CGT7ZTEGU53VD", "Arn": "arn:aws:iam:::role/AmazonEKSVPCCNIMetricsHelperRole-git-eks-demo-ipv4",
"CreateDate": "2022-03-07T17:27:29Z",
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam:::oidc-provider/oidc.eks.eu-west-1.amazonaws.com/id/"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.eu-west-1.amazonaws.com/id/:sub": "system:serviceaccount:kube-system:cni-metrics-helper"
}
}
}
]
Proper policy is attached: $ aws iam list-attached-role-policies --role-name AmazonEKSVPCCNIMetricsHelperRole-git-eks-demo-ipv4 { "AttachedPolicies": [ { "PolicyName": "AmazonEKSVPCCNIMetricsHelperPolicy-git-eks-demo-ipv4", "PolicyArn": "arn:aws:iam:::policy/AmazonEKSVPCCNIMetricsHelperPolicy-git-eks-demo-ipv4"
}
]
}
Instances have following IMDS settings "MetadataOptions": { "State": "applied", "HttpTokens": "required", "HttpPutResponseHopLimit": 2, "HttpEndpoint": "enabled", "HttpProtocolIpv6": "disabled" },
What you expected to happen: Scrape CNI metrics from aws-node pods and publish to CloudWatch
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
): 1.21 platform version eks.4cat /etc/os-release
): AL2uname -a
): uname -a Linux ip-10-0-3-119.eu-west-1.compute.internal 5.4.176-91.338.amzn2.x86_64 #1 SMP Fri Feb 4 16:59:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux