aws / aws-app-mesh-roadmap

AWS App Mesh is a service mesh that you can use with your microservices to manage service to service communication
Apache License 2.0
347 stars 25 forks source link

Feature Request: Support for EKS Pod Indentities #493

Open ricardo8990 opened 3 months ago

ricardo8990 commented 3 months ago

If you want to see App Mesh implement this idea, please upvote with a :+1:.

Tell us about your request I think EKS Pod Identities are not supported at this time for the Envoy containers injected in EKS.

Which integration(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? I created an app in my EKS cluster and gave permissions using EKS Pod Identities. I'm deploying a Node App with an AppConfig container. It works fine and the permissions are working as expected. However, when I added the AppMesh integration with the Container Injected automatically I receive the following error:

[2024-06-26 16:00:34.205][21][error][aws] [source/extensions/common/aws/credentials_provider_impl.cc:302] Could not load AWS credentials document from the task role
[2024-06-26 16:00:34.208][15][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:152] StreamAggregatedResources gRPC config stream to appmesh-envoy-management.us-west-2.amazonaws.com:443 closed: 16, Missing Authentication Token

Which causes the AppConfig container to fail trying to fetch the parameters

appconfig agent] 2024/06/26 15:37:10 INFO AppConfig Agent 2.0.3896
[appconfig agent] 2024/06/26 15:37:10 INFO serving on localhost:2772
[appconfig agent] 2024/06/26 15:37:32 ERROR retrieve failure for 'APP:ENV:DEP': bad gateway: network error connecting to service (retry in 60s)

However, I can see that the env variables in the Envoy container that EKS pod identities inject into containers are correctly set:

        - name: AWS_STS_REGIONAL_ENDPOINTS
          value: regional
        - name: AWS_CONTAINER_CREDENTIALS_FULL_URI
          value: 'http://169.254.170.23/v1/credentials'
        - name: AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE
          value: >-
            /var/run/secrets/pods.eks.amazonaws.com/serviceaccount/eks-pod-identity-token

This is the whole manifest for this particular container:

    - env:
        - name: APPMESH_PLATFORM_K8S_VERSION
          value: v1.29.4-eks-036c24b
        - name: APPNET_AGENT_ADMIN_UDS_PATH
          value: /tmp/agent.sock
        - name: APPMESH_PLATFORM_APP_MESH_CONTROLLER_VERSION
          value: v1.12.7-dirty
        - name: APPMESH_RESOURCE_ARN
          value: mesh/MESH/virtualNode/NODE_MESH
        - name: ENVOY_ADMIN_ACCESS_ENABLE_IPV6
          value: 'false'
        - name: APPMESH_FIPS_ENDPOINT
          value: '0'
        - name: ENVOY_LOG_LEVEL
          value: info
        - name: APPMESH_DUALSTACK_ENDPOINT
          value: '0'
        - name: APPMESH_PREVIEW
          value: '0'
        - name: ENVOY_ADMIN_ACCESS_LOG_FILE
          value: /tmp/envoy_admin_access.log
        - name: APPNET_AGENT_ADMIN_MODE
          value: uds
        - name: APPMESH_VIRTUAL_NODE_NAME
          value: mesh/MESH/virtualNode/NODE_MESH
        - name: AWS_REGION
          value: us-west-2
        - name: ENVOY_ADMIN_ACCESS_PORT
          value: '9901'
        - name: APPMESH_PLATFORM_K8S_POD_UID
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.uid
        - name: AWS_STS_REGIONAL_ENDPOINTS
          value: regional
        - name: AWS_CONTAINER_CREDENTIALS_FULL_URI
          value: 'http://169.254.170.23/v1/credentials'
        - name: AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE
          value: >-
            /var/run/secrets/pods.eks.amazonaws.com/serviceaccount/eks-pod-identity-token
      image: >-
        840364872350.dkr.ecr.us-west-2.amazonaws.com/aws-appmesh-envoy:v1.27.2.0-prod
      imagePullPolicy: IfNotPresent
      lifecycle:
        preStop:
          exec:
            command:
              - sh
              - '-c'
              - sleep 20
      name: envoy
      ports:
        - containerPort: 9901
          name: stats
          protocol: TCP
      readinessProbe:
        exec:
          command:
            - sh
            - '-c'
            - >-
              curl -s http://localhost:9901/server_info | grep state | grep -q
              LIVE
        failureThreshold: 3
        initialDelaySeconds: 1
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
      resources:
        requests:
          cpu: 10m
          memory: 32Mi
      securityContext:
        runAsUser: 1337
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
        - mountPath: /var/run/secrets/pods.eks.amazonaws.com/serviceaccount
          name: eks-pod-identity-token
          readOnly: true
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: kube-api-access-8nkjm
          readOnly: true

I wonder if EKS Pod Identities are not supported at this time or if there is something I can't see.

By the way, the App Role already has permissions for appmesh:StreamAggregatedResources with the resource set to the Virtual Node ARN

ricardo8990 commented 3 months ago

I added the ENVOY_LOG_LEVEL to DEBUG and found this logs:

[2024-06-27 03:40:49.620][22][debug][aws] [source/extensions/common/aws/credentials_provider_impl.cc:67] Getting AWS credentials from the environment
[2024-06-27 03:40:49.620][22][debug][aws] [source/extensions/common/aws/credentials_provider_impl.cc:288] Getting AWS credentials from the task role at URI: http://169.254.170.23/v1/credentials
[2024-06-27 03:40:49.621][22][debug][misc] [source/extensions/common/aws/utility.cc:300] Could not fetch AWS metadata: HTTP response code said error
[2024-06-27 03:40:50.281][17][debug][main] [source/server/server.cc:263] flushing stats
[2024-06-27 03:40:50.281][17][debug][main] [source/server/server.cc:273] Envoy is not fully initialized, skipping histogram merge and flushing stats
[2024-06-27 03:40:50.622][22][debug][misc] [source/extensions/common/aws/utility.cc:300] Could not fetch AWS metadata: HTTP response code said error
[2024-06-27 03:40:51.623][22][debug][misc] [source/extensions/common/aws/utility.cc:300] Could not fetch AWS metadata: HTTP response code said error
[2024-06-27 03:40:52.624][22][debug][misc] [source/extensions/common/aws/utility.cc:300] Could not fetch AWS metadata: HTTP response code said error
[2024-06-27 03:40:53.624][22][error][aws] [source/extensions/common/aws/credentials_provider_impl.cc:302] Could not load AWS credentials document from the task role
[2024-06-27 03:40:53.625][22][debug][aws] [source/extensions/common/aws/credentials_provider_impl.cc:442] No AWS credentials found, using anonymous credentials
[2024-06-27 03:40:53.627][17][debug][grpc] [source/common/grpc/google_async_client_impl.cc:379] Finish with grpc-status code 16
[2024-06-27 03:40:53.627][17][debug][grpc] [source/common/grpc/google_async_client_impl.cc:224] notifyRemoteClose 16 Missing Authentication Token
[2024-06-27 03:40:53.627][17][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:152] StreamAggregatedResources gRPC config stream to appmesh-envoy-management.us-west-2.amazonaws.com:443 closed: 16, Missing Authentication Token
[2024-06-27 03:40:53.627][17][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:115] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed

Looking at the logs in the Pod Intentity I can see this repeated many times:

{"client-addr":"10.0.3.220:59826","cluster-name":"CLUSTER_NAME","level":"info","msg":"handling new request request from 10.0.3.220:59826","time":"2024-06-27T03:43:22Z"}
{"client-addr":"10.0.3.220:59826","cluster-name":"CLUSTER_NAME","level":"error","msg":"Error fetching credentials: Service account token cannot be empty","time":"2024-06-27T03:43:22Z"}
AhmadMS1988 commented 2 months ago

Adding to the point, upstream envoy supported it starting 1.30.0. https://github.com/envoyproxy/envoy/blob/f79b881883e862bc0f7dc7f09d3bc811fb0944f6/changelogs/1.30.0.yaml#L483 Can we have aws-appmesh-envoy image based on 1.30? Thanks