web requests to nginx pods are coming as IPv6 when cluster is set as IPv4 single stack

innovia commented 2 years ago

Version

Karpenter Version: v0.16.2

Kubernetes Version: v1.22.12-eks-ba74326

node tamplate


apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: default
spec:
  amiFamily: AL2
  instanceProfile: "KarpenterNodeInstanceProfile-{{ .Values.karpenter.clusterName }}"
  metadataOptions:
    httpProtocolIPv6: "disabled"
    httpPutResponseHopLimit: 2
    httpTokens: "optional"
  blockDeviceMappings:
    - deviceName: "/dev/xvda"
      ebs:
        deleteOnTermination: true
        encrypted: true
        iops: {{ .Values.karpenter.nodeTemplates.default.volumeIops }}
        throughput: {{ .Values.karpenter.nodeTemplates.default.volumeThroughput }}
        volumeSize: {{ .Values.karpenter.nodeTemplates.default.volumeSize }}
        volumeType: {{ .Values.karpenter.nodeTemplates.default.volumeType }}
  subnetSelector:
    kubernetes.io/cluster/{{ .Values.karpenter.clusterName }}: shared
  securityGroupSelector:
    karpenter.sh/discovery/{{ .Values.karpenter.clusterName }}: {{ .Values.karpenter.clusterName }}
  tags:
    karpenter.sh/discovery/{{ .Values.karpenter.clusterName }}: {{ .Values.karpenter.clusterName }}
    kubernetes-cluster: {{ .Values.karpenter.clusterName }}
    environment: {{ .Values.environment }}
    aws-node-termination-handler/managed: "true"
  userData: |
    MIME-Version: 1.0
    Content-Type: multipart/mixed; boundary="BOUNDARY"

    --BOUNDARY
    Content-Type: text/x-shellscript; charset="us-ascii"

    #!/bin/bash
    # Workaround for dd-agent which fails to check disk usage if binfmt_misc isn't mounted before Kubelet comes up
    mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc || true
    echo 1 > /proc/sys/fs/binfmt_misc/status

    # This installs the System Manager agent
    yum install -y amazon-ssm-agent
    systemctl start amazon-ssm-agent
    systemctl enable amazon-ssm-agent
    yum clean all

    TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
    LIFE_CYCLE=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -s http://169.254.169.254/latest/meta-data/instance-life-cycle)
      KUBELET_EXTRA_ARGS="--node-labels=node.kubernetes.io/lifecycle=${LIFE_CYCLE}"
      # End of script used as a separator for concatenating other scripts

     # Tweak Kubelet Configuration
     echo "$(jq '.registryPullQPS=500' /etc/kubernetes/kubelet/kubelet-config.json)" > /etc/kubernetes/kubelet/kubelet-config.json
     echo "$(jq '.registryBurst=100' /etc/kubernetes/kubelet/kubelet-config.json)" > /etc/kubernetes/kubelet/kubelet-config.json
     echo "$(jq '.evictionHard."memory.available"="10%"' /etc/kubernetes/kubelet/kubelet-config.json)" > /etc/kubernetes/kubelet/kubelet-config.json
--BOUNDARY--

provisioner

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: on-demand-provisioner
spec:
  consolidation:
    enabled: {{ .Values.consolidate }}
  weight: 1
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["on-demand"]
    # Look for instances with memory larger than 2GB
    - key: "karpenter.k8s.aws/instance-memory"
      operator: NotIn
      values:
        - "512"
        - "1024"
        - "2048"
        - "4096"
    # Look for instances with cpus greater than 4
    - key: "karpenter.k8s.aws/instance-cpu"
      operator: NotIn
      values:
        - "1"
        - "2"
        - "4"
    # Exclude burstable families
    - key: "karpenter.k8s.aws/instance-family"
      operator: NotIn
      values:
        - "t1"
        - "t2"
        - "t3"
        - "t3a"
        - "t4"
        - "t4g"
        - "c5a"
        - "c6a"
        - "m5a"
        - "m6a"
        - "r5a"
        - "r6a"
    # Exclude Graviton Instances
    - key: "kubernetes.io/arch"
      operator: In
      values:
        - "amd64"
    # taints requirements
    - key: "node.kubernetes.io/lifecycle"
      operator: In
      values: ["on-demand"]
  # Limits prevent Karpenter from creating new instances once the limit is exceeded.
  limits:
    resources:
      cpu: 1000
  # name of node template to reference
  providerRef:
    name: default
  {{- if not .Values.consolidate }}
  ttlSecondsAfterEmpty: 5  # scale down nodes after 15 seconds without workloads (excluding daemons)
  {{- end }}
  # Setting a value here enables node expiry. After nodes reach the defined age in seconds, they will be deleted, even if in use.
  # This enables nodes to effectively be periodically “upgraded” by replacing them with newly provisioned instances.
  ttlSecondsUntilExpired: {{ .Values.ttlSecondsUntilExpired }}

we experience lots of connection refused like so

2022/09/23 20:57:48 [error] 7#7: *3994 connect() failed (111: Connection refused) while connecting to upstream, client: 54.203.69.179, server: _, request: "GET /api/v2/deployments/?cluster_id=1&multi_tenant=true&page_number=1&page_size=200 HTTP/1.1", upstream: "http://[::1]:8080/api-internal/v2/deployments/?cluster_id=1&multi_tenant=true&page_number=1&page_size=200", host: "xyz.app-domain.com"

this doesn't happen on non karpenter nodes - we reverted the karpneter setup and the errors stopped

its using the same AMI for both karpenter and asg self managed nodes: amazon-eks-node-1.22-v20220914 ami-0f2b7c6874eb8414f

We can not find any difference in karpenter and the self managed nodes, the only thing I can think of is the bootstrap script that somehow does enable ipv6 even though the cluster is single stack ipv4

Expected Behavior

no errors, calls should be 127.0.0.1:8080 and not [::1]:8080

Actual Behavior

requests err on nginx detecting ipv6

Steps to Reproduce the Problem

run nginx with proxy pass to gunicorn and use the same setup as above for karpenter

Resource Specs and Logs

2022/09/23 20:57:48 [error] 7#7: *3994 connect() failed (111: Connection refused) while connecting to upstream, client: 54.203.69.179, server: _, request: "GET /api/v2/deployments/?cluster_id=1&multi_tenant=true&page_number=1&page_size=200 HTTP/1.1", upstream: "http://[::1]:8080/api-internal/v2/deployments/?cluster_id=1&multi_tenant=true&page_number=1&page_size=200", host: "server.domain.com"

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

dewjam commented 2 years ago

Hey @innovia , Just so I'm sure, nginx and gunicorn are running in different containers in the same pod, correct? And Nginx has an upstream configured for gunicorn which is the loopback address?

I believe I am able to reproduce this. Karpenter uses containerd as the default container runtime. When using your node template (with containerd), I see the following when trying to curl localhost:

*   Trying ::1:80...
* Connected to localhost (::1) port 80 (#0)
> GET / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.74.0
> Accept: */*

When using your node template with docker as the runtime I see:

*   Trying 127.0.0.1:80...
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.74.0
> Accept: */*

Can you confirm your non-Karpenter nodes are using Docker? And also that your Karpenter nodes are using containerd?

innovia commented 2 years ago

Hi @dewjam

Yes I am running docker! And yes you are correct nginx and gunicorn are on the same pod different containers

Is there a way to change that template to use docker ? Or configure that containerd for only ipv4?

innovia commented 2 years ago

Yes I think my karpenter nodes are containerd, I remember I had rough time listing images I aide the node until I figured it was containerd

dewjam commented 2 years ago

Trying to track down why DNS resolution of localhost is behaving differently with Containerd vs Docker.

Thinking through this further, I'm guessing gunicorn is only listening on the ipv4 loopback address. Would you be willing to share your nginx and gunicorn configs? Just want to validate my assumption.

innovia commented 2 years ago

I looked at my config

They are helm templates, looks like we call localhost as the hostname, so maybe containerd is resolving to ipv6

Do you know how to set it to be docker?

Here is the nginx config but the gunicorn is pretty standard

{{- define "base-service.nginx.allow-localnets" }} 
 {{- range $net := .Values.nginx.config.localnets }} 
 allow {{ $net }}; 
 {{- end }} 
 deny  all; 
 {{- end }} 

 {{- define "base-service.nginx.conf" }} 
 worker_processes {{ .Values.nginx.config.worker_processes }}; 
 worker_rlimit_nofile {{ .Values.nginx.config.worker_rlimit_nofile }}; 
 pid      /var/run/nginx/nginx.pid; 
 error_log  /var/log/nginx/error.log {{ .Values.nginx.config.log.level }}; 

 {{ tpl .Values.nginx.config.snippet.head . | nindent 2 }} 

 events { 
   worker_connections {{ .Values.nginx.config.worker_connections }}; 
   multi_accept {{ .Values.nginx.config.multi_accept }}; 

   {{ tpl .Values.nginx.config.snippet.events . | nindent 4 }} 
 } 

 http { 

   ## 
   # Basic Settings 
   ## 

   sendfile on; 
   tcp_nopush on; 
   tcp_nodelay on; 
   keepalive_timeout {{ .Values.nginx.config.keepalive_timeout }}; 
   types_hash_max_size 2048; 
   server_tokens off; 

   ## 
   # Gzip Settings 
   ## 
   {{- if .Values.nginx.config.gzip.enabled }} 
   gzip on; 
   gzip_types {{ .Values.nginx.config.gzip.types }}; 
   gzip_comp_level {{ .Values.nginx.config.gzip.comp_level }}; 
   gzip_min_length {{ .Values.nginx.config.gzip.min_length }}; 
   {{- else }} 
   gzip off; 
   {{- end }} 

   include /etc/nginx/mime.types; 
   default_type application/octet-stream; 

   real_ip_header X-Forwarded-For; 
   real_ip_recursive on; 

   log_format json escape=json '{' 
       '"bytes_sent": $bytes_sent,' 
       '"duration": $request_time,' 

       '"http": {' 
       '"referer": "$http_referer",' 
       '"useragent":"$http_user_agent",' 
       '"method": "$request_method",' 
       '"status_code":$status,' 
       '"api_key_name": "$upstream_http_x_api_key_name",' 
       '"connection":$connection,' 
       '"connection_requests":$connection_requests,' 
       '"connections_active":$connections_active,' 
       '"connections_reading":$connections_reading,' 
       '"connections_writing":$connections_writing,' 
       '"connections_waiting":$connections_waiting,' 
       '"url": "$uri",' 
       '"url_details": {' 
         '"host": "$host",' 
         '"queryString": "$query_string",' 
         '"path": "$uri"' 
         '}' 
       '},' 
       '"request_id": "$http_x_request_id",' 
       '"session_id": "$upstream_http_x_session_id",' 
       '"user_id": "$upstream_http_x_user_id",' 
       '"remote_addr": "$remote_addr",' 
       '"remote_user":"$remote_user",' 
       '"request_length": $request_length,' 
       '"request_host": "$http_x_forwarded_host",' 
       '"request_proto": "$http_x_forwarded_proto",' 
       '"request_query": "$args",' 
       '"request_time": $request_time,' 
       '"server_name":"$server_name",' 
       '"time": "$time_iso8601",' 
       '"type": "access",' 
       '"upstream_addr": "$upstream_addr",' 
       '"upstream_response_length": "$upstream_response_length",' 
       '"upstream_response_time": "$upstream_response_time",' 
       '"upstream_status": "$upstream_status",' 
       '"uri": "$uri",' 
       '"vhost": "$host",' 
       '"x_forwarded_for": "$http_x_forwarded_for",' 
       '"x_forwarded_proto": "$http_x_forwarded_proto",' 
       '"x_forwarded_port": "$http_x_forwarded_port",' 
       '"x_client_version": "$http_x_client_version",' 
       '"x_requesting_service": "$http_x_requesting_service",' 
       '"x_requesting_instance": "$http_x_requesting_instance"' 
     '}'; 

   access_log /var/log/nginx/access.log json; 

   upstream app { 
     server {{ .Values.nginx.app.hostname }}:{{ .Values.nginx.app.containerPort }}; 
     keepalive {{ .Values.nginx.config.proxy_keepalive }}; 

     {{ tpl .Values.nginx.config.snippet.upstream . | nindent 6 }} 
   } 

   proxy_cache_path {{ .Values.nginx.config.cache.path }} levels=1:2 keys_zone=static_zone:10m inactive=60m max_size={{ .Values.nginx.config.cache.max_size }}; 
   proxy_cache_key "$scheme$request_method$host$request_uri"; 

   {{ if .Values.nginx.config.ssl.enabled }} 
     ssl_certificate      {{ .Values.nginx.config.ssl.certLocation }}; 
     ssl_certificate_key  {{ .Values.nginx.config.ssl.certKeyLocation }}; 
     ssl_session_tickets {{ .Values.nginx.config.ssl.sessionTickets }}; 

     ssl_protocols              {{ .Values.nginx.config.ssl.protocols }}; 
     ssl_ciphers                {{ .Values.nginx.config.ssl.ciphers }}; 
     ssl_prefer_server_ciphers  {{ .Values.nginx.config.ssl.preferServerCiphers }}; 

     ssl_buffer_size {{ .Values.nginx.config.ssl.bufferSize }}; 

     {{ if .Values.nginx.config.ssl.dhParamsFileLoc }} 
     ssl_dhparam {{ .Values.nginx.config.ssl.dhParamsFileLoc }}; 
     {{end}} 

   {{ end }} 

   {{ tpl .Values.nginx.config.snippet.http . | nindent 4 }} 

   server { 
     # This server only deals with the status endpoint 
     listen {{ .Values.nginx.config.port_status }} default_server; 
     server_name  _; 

     # This needs to be on for this server in order to support the DD agent 
     server_tokens on; 

     location /nginx_status { 
       stub_status; 
       access_log off; 

       {{ include "base-service.nginx.allow-localnets" . | nindent 6 }} 
     } 
   } 

   server { 
     listen {{ .Values.nginx.config.port }}{{- if .Values.nginx.config.reuseport }} reuseport{{- end }} backlog={{ .Values.nginx.config.backlog }} default_server; 
     server_name  _; 

     {{ if .Values.nginx.config.ssl.enabled }} 
     listen {{ .Values.nginx.config.ssl.port }} ssl {{- if .Values.nginx.config.reuseport }} reuseport{{- end }} backlog={{ .Values.nginx.config.backlog }} default_server; 
     {{ end }} 

     client_max_body_size {{ .Values.nginx.config.client_max_body_size }}; 

     proxy_connect_timeout {{ .Values.nginx.config.proxy_connect_timeout }}s; 
     proxy_read_timeout {{ .Values.nginx.config.proxy_read_timeout }}s; 
     proxy_send_timeout {{ .Values.nginx.config.proxy_send_timeout }}s; 
     proxy_buffer_size {{ .Values.nginx.config.proxy_buffer_size }}; 
     proxy_buffers {{ .Values.nginx.config.proxy_buffers }}; 
     proxy_busy_buffers_size {{ .Values.nginx.config.proxy_busy_buffers_size }}; 

     {{ tpl .Values.nginx.config.snippet.server . | nindent 8 }} 

     {{ if .Values.nginx.config.ssl.enableRedirect }} 
     # Make sure we always redir back to TLS 
     # Note that the ALB will always send the header 
     # X-Forwarded-Proto with the value "https" for TLS-enabled requests 
     if ($http_x_forwarded_proto != "https") { 
         return 301 https://$host$request_uri; 
     } 
     {{ end }} 

     location /static { 
       proxy_cache static_zone; 
       proxy_cache_bypass  $http_cache_control; 
       add_header X-Proxy-Cache $upstream_cache_status; 
       proxy_cache_valid 10m; 

       expires 10m; 
       add_header Cache-Control "public"; 

       proxy_pass http://app; 
       proxy_set_header    Host            $host; 
       proxy_set_header    X-Real-IP       $remote_addr; 
       port_in_redirect off; 
     } 

     # This location tests the full request stack (meant as a local k8s check) 
     location {{ .Values.nginx.config.health_check_path }} { 
       access_log off; 
       add_header Content-Type text/html; 
       return 200 'OK'; 
     } 

     # Extra location that just returns a quick 200 (meant as an LB or external check) 
     location {{ .Values.nginx.config.ping_path }} { 
       access_log off; 
       add_header Content-Type text/html; 
       return 200 'OK'; 
     } 

     location / { 
       proxy_set_header    Host            $host; 
       proxy_set_header    X-Real-IP       $remote_addr; 
       port_in_redirect off; 

       {{- if .Values.nginx.config.cors.enabled }} 
       # Cors Preflight methods needs additional options and different Return Code 
       if ($request_method = 'OPTIONS') { 
           add_header Access-Control-Allow-Origin '{{ tpl .Values.nginx.config.cors.allow_origin . }}'; 
           add_header Access-Control-Allow-Credentials '{{ tpl .Values.nginx.config.cors.allow_creds . }}'; 
           add_header Access-Control-Allow-Methods '{{ tpl .Values.nginx.config.cors.allow_methods . }}'; 
           add_header Access-Control-Allow-Headers '{{ tpl .Values.nginx.config.cors.allow_headers . }}'; 

           add_header Access-Control-Max-Age {{ tpl .Values.nginx.config.cors.max_age . }}; 
           add_header Content-Type 'text/plain charset=UTF-8'; 
           add_header Content-Length 0; 
           return 204; 
       } 

       add_header Access-Control-Allow-Origin '{{ tpl .Values.nginx.config.cors.allow_origin . }}'; 
       add_header Access-Control-Allow-Credentials '{{ tpl .Values.nginx.config.cors.allow_creds . }}'; 
       {{- end }} 

       {{- if .Values.nginx.config.generateCspNonce }} 
       # Pass the nonce along to the backend service 
       proxy_set_header X-CSP-Nonce $request_id; 
       {{- end }} 

       {{- if .Values.nginx.config.csp }} 
       add_header Content-Security-Policy "{{ tpl .Values.nginx.config.csp . }}" always; 
       {{- end }} 

       {{ tpl .Values.nginx.config.snippet.location . | nindent 6 }} 

       proxy_pass http://app; 
     } 

   } 
 } 
 {{- end }}

innovia commented 2 years ago

How did you set the docker for the template? It's seems hardcoded in karpenter code

FernandoMiguel commented 2 years ago

Do you know how to set it to be docker?

https://karpenter.sh/v0.16.2/provisioner/#:~:text=cases.%0A%20%20kubeletConfiguration%3A%0A%20%20%20%20clusterDNS%3A%20%5B%2210.0.1.100%22%5D-,containerRuntime%3A%20containerd,-systemReserved%3A%0A%20%20%20%20%20%20cpu%3A%20100m%0A%20%20%20%20%20%20memory%3A%20100Mi%0A%20%20%20%20%20%20ephemeral

but keep in mind this is going away in EKS 1.24, so you better look for a proper fix

innovia commented 2 years ago

Thank you, I'll start working on fixes for containerd

dewjam commented 2 years ago

Just to follow-up on this a bit further. This is expected behavior. With Containerd, IPV6 addresses are allocated to containers by default whereas with Docker, they are not.

sheetaljoshi commented 1 year ago

Did we solve this nginx problem to work with containerd binding to default IPv6? If so, can you share nginx config here?

Nandeeshbb commented 1 year ago

We are facing the same issue and any fix for this??? as we are using AWS AMI family which AL2 type and the karpenter docs says for AL2 type ami family karpenter will enable IPV6 by default and due to this services are not working on our kubernetes cluster and any way to disable this.?

Does Karpenter support IPv6? Yes! Karpenter dynamically discovers if you are running in an IPv6 cluster by checking the kube-dns service’s cluster-ip. When using an AMI Family such as AL2, Karpenter will automatically configure the EKS Bootstrap script for IPv6.

jonathan-innis commented 10 months ago

Hey @Nandeeshbb, it's tough for us to keep track of these issues that have been closed and resolved by the original creator. If you are still seeing this issue in Karpenter and you believe it's a bug, you can open a new issue against the repo. You can also feel-free to reach out in the #karpenter channel in the Kubernetes slack if you want quicker responses or it's more of a question and less of a suspected bug.

Nandeeshbb commented 10 months ago

@jonathan-innis Its resolved....As Karpenter is enabling containerd as default runtime engine...so I have disabled containerd as default runtime engine.

aws / karpenter-provider-aws