aws-samples / eks-workshop

AWS Workshop for Learning EKS
https://eksworkshop.com
MIT No Attribution
804 stars 1.24k forks source link

Issue: AppMesh controller for EKS Workshop not working as expected #1056

Closed shaileshgupta2k closed 2 years ago

shaileshgupta2k commented 3 years ago

Hi,

One of the AppMesh's customer and I have been trying to implement the djapp K8s examples with AppMesh and have been running into issues. The Pod's sidecar envoy fails the readiness probe due to unavailability of any credentials for the Envoy to Talk to AppMesh's Envoy Management Service. Though Envoy is supposed to use IRSA, but Envoy is made to fallback to instance metadata credentials.

Below are the error seen in Envoy logs:

[2021-02-11 03:08:28.884][24][error][aws] [source/extensions/common/aws/credentials_provider_impl.cc:94] Could not retrieve credentials listing from the instance metadata

[2021-02-11 03:08:28.884][24][debug][aws] [source/extensions/common/aws/credentials_provider_impl.cc:298] No AWS credentials found, using anonymous credentials

[2021-02-11 03:08:28.885][1][debug][grpc] [source/common/grpc/google_async_client_impl.cc:207] notifyRemoteClose 16 Missing Authentication Token

[2021-02-11 03:08:28.885][1][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:101] StreamAggregatedResources gRPC config stream closed: 16, Missing Authentication Token

From the Error code of 16, it means that The Envoy proxy does not have valid authentication credentials for AWS.

Also, when we restart the deployment for the [AppMesh sidecar injection](kubectl -n prod rollout restart deployment dj jazz-v1 metal-v1 ) into the PODS, the old pods stay live along with the new pods, for example:

NAME                        READY   STATUS    RESTARTS   AGE
dj-5fcf8b7c97-h8frt         1/2     Running   0          57s
dj-6bf5fb7f45-6dkl8         1/1     Running   0          6m28s
jazz-v1-6f688dcbf9-v7gwq    1/1     Running   0          6m28s
jazz-v1-7bfbf56c59-j97d5    1/2     Running   0          57s
metal-v1-566756fbd6-lssvg   1/1     Running   0          6m28s
metal-v1-6bcbcbb6-54ctv     1/2     Running   0          57s

Please take a look into the issue and let me know if further data is required or not. I think the way AppMesh is being injected into the Pods is taking a hit somewhere.

jlbutler commented 3 years ago

The walkthrough does use IRSA. It needs an update to use a more refined policy rather than full access, but the step is clearly documented and should work as you indicate.

I just ran through the steps and indeed it works as expected. Can you reproduce this? Was any failure observed in creating the service account? If not, logs from the controller deployment may be helpful.

soumi-ml commented 3 years ago

I am having the exact same issue/ Resources gRPC config stream closed: 16, Missing Authentication Token. However, this issue cropped up after I had done clean up(following the steps) and then tried to recreate the dj-app mesh.

jlbutler commented 3 years ago

I am still unable to reproduce this taking only the steps in the workshop module. I am struggling to understand the problem described here, as we should not require proxy authorization for this demo. That said, I have filed a PR to add proxy auth, since we'll eventually need it in the future. For now, it should only be required for Virtual Gateways and if using TLS.

If someone can reproduce this following the explicit steps in the module, please see the updated version which will also apply proxy auth for the proxies running in the app namespace. I'd like to know if it resolves the problem.