aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.22k stars 320 forks source link

EKS Fargate [Bug]: Containers running in Fargate cannot get their own metrics from the kubelet #1798

Open pptb-aws opened 2 years ago

pptb-aws commented 2 years ago

Community Note

Tell us about your request What do you want us to build? Containers running in Fargate cannot get their own metrics from the kubelet

Which service(s) is this request for? This could be Fargate, EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.

Currently, when trying to use curl or when running the metrics server calls against https://:10250/metrics/resource will fail, in both cases it is a, connection refused error. Below is an example from the metrics server.

E0804 18:26:43.486945       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.165.181:10250/metrics/resource\": dial tcp 192.168.165.181:10250: connect: connection refused" node="fargate-ip-192-168-165-181.ec2.internal"

The goal of this feature request/bug report would be to allow a Fargate pod to know it's own kubelet metrics.

Are you currently working around this issue? How are you currently solving this problem? I do not see a workaround.

Additional context Anything else we should know?

This mainly impact the metrics-server application as far as I can tell. The reasons for this are detailed here and this issue was previously raised with the metrics-server GitHub here. I was unable to find this issue raised here so I am putting it here to give this issue more visibility and making it easier to search.

Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

ollypom commented 2 years ago

I appreciate this doesn't answer @pptb-aws's question about a Fargate Pod being able to reach its own Kubelet, but a monitoring server (running inside or outside of the cluster) can access all Kubelet Metrics via the Api Server:

kubectl get --raw /api/v1/nodes/fargate-ip-10-1-213-127.eu-west-1.compute.internal/proxy/metrics/cadvisor

Its how the EKS Fargate OpenTelemetry Collector blog and the EKS Fargate Prometheus blog work.

A snippet from the OpenTelemetry Collector Config:

scrape_configs:
- job_name: 'kubelets-cadvisor-metrics'
  sample_limit: 10000
  scheme: https

  kubernetes_sd_configs:
  - role: node
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
      # Only for Kubernetes ^1.7.3.
      # See: https://github.com/prometheus/prometheus/issues/2916
    - target_label: __address__
      # Changes the address to Kube API server's default address and port
      replacement: kubernetes.default.svc:443
    - source_labels: [__meta_kubernetes_node_name]
      regex: (.+)
      target_label: __metrics_path__
      # Changes the default metrics path to kubelet's proxy cadvdisor metrics endpoint
      replacement: /api/v1/nodes/$${1}/proxy/metrics/cadvisor
DilipCoder commented 6 months ago

facing the same issue

gkrishn1 commented 6 months ago

I am also facing the same issue with AWS EKS Fargate cluster. Any idea when its going to fix in the AWS EKS Fargate cluster? Any ETA?

OverStruck commented 5 months ago

for anyone reading, this is a fargate limitation that affects metrics server pods running on fargate cannot query/ping/reach the kubelet of their own host, in this case the kubelet of the fagate node workaround is to run metrics server on EC2 instances on which metrics server can reach kubelet on the its own EC2 see: https://github.com/kubernetes-sigs/metrics-server/issues/694#issuecomment-815193036

larskinder commented 6 days ago

For the ones using Terraform addons to create metrics server, do the following as a workaround. This applies only, if you set Fargate to mange your default kube-system namespace.

module "eks_blueprints_addons" {
  #checkov:skip=CKV_TF_1: "Ensure Terraform version constraint is set"
  source  = "aws-ia/eks-blueprints-addons/aws"
  version = "~> 1.17"
  [...]
  metrics_server = {
    set = [
      {
        name  = "args[0]"
        value = "--kubelet-insecure-tls"
      }
    ]
    namespace = "kube-system-metrics"
  }
  depends_on = [kubernetes_namespace.system_metrics]
}

# This namespace is created, as there is currently an issue with fargate
# https://github.com/aws/containers-roadmap/issues/1798
resource "kubernetes_namespace" "system_metrics" {
  metadata {
    annotations = {
      name = local.metrics_namespace
    }

    labels = {
      "name" = local.metrics_namespace
    }
    name = local.metrics_namespace
  }
}