aws / eks-charts

Amazon EKS Helm chart repository
Apache License 2.0
1.2k stars 973 forks source link

Appmesh-controller getting Timeout error in k8s deployment logs in corporate proxy #867

Open gauravsuryawanshi0806 opened 1 year ago

gauravsuryawanshi0806 commented 1 year ago

Hi Team,

I am trying to implement the Appmesh in AWS EKS cluster so while deployment of appmesh controller using helm chart getting an below error.

I am running below command to create the controller. helm upgrade -i appmesh-controller eks/appmesh-controller --namespace appmesh-system --set image.repository=840364872350.dkr.ecr.eu-central-1.amazonaws.com/amazon/appmesh-controller --set sidecar.image.repository=840364872350.dkr.ecr.eu-central-1.amazonaws.com/aws-appmesh-envoy --set init.image.repository=840364872350.dkr.ecr.eu-central-1.amazonaws.com/aws-appmesh-proxy-route-manager

It is running successful but upon checking the logs found that Warning ReconcileError 40m Mesh RequestError: send request failed caused by: Get "https://appmesh.eu-central-1.amazonaws.com/v20190125/meshes/my-mesh": dial tcp 18.197.48.127:443: i/o timeout Warning ReconcileError 8m35s (x3 over 24m) Mesh RequestError: send request failed caused by: Get "https://appmesh.eu-central-1.amazonaws.com/v20190125/meshes/my-mesh": dial tcp 52.58.163.121:443: i/o timeout Warning ReconcileError 33s (x2 over 32m) Mesh RequestError: send request failed caused by: Get "https://appmesh.eu-central-1.amazonaws.com/v20190125/meshes/my-mesh": dial tcp 18.185.122.110:443: i/o timeout

so when i check the endpoint URL via curl into same server with corporate proxy as i am doing poc into my company aws account. with curl i am able to reach the URL but via K8s pod/deployment i am not able to successful get the controller logs.

curl response: [NQ10037598@ip-10-170-57-92 appmesh-controller]$ curl -vvv https://appmesh.eu-central-1.amazonaws.com/v20190125/meshes/my-mesh

also i have check the helm chart values.yaml file into env variable is mention but somehow it is not working, can you please help me how i should pass corporate proxy environment variable during deployment of appmesh-controller.

joesbigidea commented 1 year ago

Hi,

Have you tried setting the proxy configuration in your helm upgrade command? Here's an example:

helm upgrade -i appmesh-controller eks/appmesh-controller \
    --namespace appmesh-system \
    --set region=$AWS_REGION \
    --set serviceAccount.create=false \
    --set serviceAccount.name=appmesh-controller \
    --set env.http_proxy=http://theproxyyouwant:port \
    --set env.https_proxy=http://theproxyyouwant:port \
    --set env.no_proxy='localhost\,127.0.0.1\,.cluster.local\,10.0.0.0/8' 

That should set up the correct ENV vars in the App Mesh controller pod. You can verify the env is configured properly by doing a describe on the controller deployment: kubectl -n appmesh-system describe deployment appmesh-controller

gauravsuryawanshi0806 commented 1 year ago

Dear Joe,

Thanks for reply.

I have followed the steps above given my you , Timeout issue is resolved but now getting below error. Command Ran : helm upgrade -i appmesh-controller eks/appmesh-controller \ --namespace appmesh-system \ --set region=$AWS_REGION \ --set image.repository= 840364872350.dkr.ecr.eu-central-1.amazonaws.com/amazon/appmesh-controller \ --set sidecar.image.repository= 840364872350.dkr.ecr.eu-central-1.amazonaws.com/aws-appmesh-envoy \ --set init.image.repository= 840364872350.dkr.ecr.eu-central-1.amazonaws.com/aws-appmesh-proxy-route-manager \ --set env.http_proxy=http://XXXX:8080 \ --set env.https_proxy=http://XXXX:8080 \ --set env.no_proxy='localhost\,127.0.0.1\,.cluster.local\,10.0.0.0/8' For 5 min deployment successful , pod was running then gone into crashloopbackoff status Output in controller pod log : kubectl logs appmesh-controller-6955b6977d-nxv9b -n appmesh-system

*{"level":"info","ts":1672890255.6905174,"logger":"setup","msg":"version","GitVersion":"v1.9.0","GitCommit":"94d067084b49a60b3bfbb84135ee6d8fcf73e9f9","BuildDate":"2022-11-14T20:33:15+0000"}{"level":"info","ts":1672890255.6905837,"logger":"setup","msg":"Health @. gaurav]$ kubectl logs appmesh-controller-6955b6977d-nxv9b -n appmesh-system{"level":"info","ts":1672890255.6905174,"logger":"setup","msg":"version","GitVersion":"v1.9.0","GitCommit":"94d067084b49a60b3bfbb84135ee6d8fcf73e9f9","BuildDate":"2022-11-14T20:33:15+0000"}{"level":"info","ts":1672890255.6905837,"logger":"setup","msg":"Health endpoint","HealthProbeBindAddress":":61779"}{"level":"error","ts":1672890292.8787196,"logger":"controller-runtime.manager","msg":"Failed to get API Group-Resources","error":"an error on the server (\"\") has prevented the request from @./pkg/manager/manager.go:312\nmain.main\n\t/workspace/main.go:150\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225 @./pkg/manager/manager.go:312\nmain.main\n\t/workspace/main.go:150\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225>"}{"level":"error","ts":1672890292.8788857,"logger":"setup","msg":"unable to start app mesh controller","error":"an error on the server (\"\") has prevented the request from @. gaurav]$ kubectl logs appmesh-controller-6955b6977d-nxv9b -n appmesh-system -f{"level":"info","ts":1672890255.6905174,"logger":"setup","msg":"version","GitVersion":"v1.9.0","GitCommit":"94d067084b49a60b3bfbb84135ee6d8fcf73e9f9","BuildDate":"2022-11-14T20:33:15+0000"}{"level":"info","ts":1672890255.6905837,"logger":"setup","msg":"Health endpoint","HealthProbeBindAddress":":61779"}{"level":"error","ts":1672890292.8787196,"logger":"controller-runtime.manager","msg":"Failed to get API Group-Resources","error":"an error on the server (\"\") has prevented the request from @./pkg/manager/manager.go:312\nmain.main\n\t/workspace/main.go:150\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225 @./pkg/manager/manager.go:312\nmain.main\n\t/workspace/main.go:150\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225>"}{"level":"error","ts":1672890292.8788857,"logger":"setup","msg":"unable to start app mesh controller","error":"an error on the server (\"\") has prevented the request from succeeding","stacktrace":"main.main\n\t/workspace/main.go:172\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225"}*

i am able to get service details with endpoint also. kubectl describe svc appmesh-controller-webhook-service -n appmesh-system Name: appmesh-controller-webhook-service Namespace: appmesh-system Labels: app.kubernetes.io/instance=appmesh-controller app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=appmesh-controller app.kubernetes.io/version=1.9.0 helm.sh/chart=appmesh-controller-1.9.0 Annotations: meta.helm.sh/release-name: appmesh-controller meta.helm.sh/release-namespace: appmesh-system Selector: control-plane=appmesh-controller Type: ClusterIP IP Family Policy: SingleStack IP Families: IPv4 IP: 172.20.41.16 IPs: 172.20.41.16 Port: 443/TCP TargetPort: 9443/TCP Endpoints: 10.191.197.188:9443 http://10.191.197.188:9443 Session Affinity: None Events:

but when i create the appmesh then below error is comming. mesh yaml file : apiVersion: appmesh.k8s.aws/v1beta2 kind: Mesh metadata: name: my-mesh spec: namespaceSelector: matchLabels: mesh: my-mesh env: http_proxy: http://XXXXXX https_proxy: http://XXXXX kubectl apply -f mesh.yaml Error from server (InternalError): error when applying patch: {"metadata":{"annotations":{" kubectl.kubernetes.io/last-applied-configuration ":"{\"apiVersion\":\"appmesh.k8s.aws/v1beta2\",\"kind\":\"Mesh\",\"metadata\":{\"annotations\":{},\"name\":\"my-mesh\"},\"spec\":{\"env\":{\"http_proxy\":\" http://XXXXXX:8080 \",\"https_proxy\":\"http:/XXXXXX8080\"},\"namespaceSelector\":{\"matchLabels\":{\"mesh\":\"my-mesh\"}}}}\n"}},"spec":{"env":{"http_proxy":" http://XXXXX","https_proxy":"http://XXXXXX080"}}} to: Resource: "appmesh.k8s.aws/v1beta2, Resource=meshes", GroupVersionKind: "appmesh.k8s.aws/v1beta2, Kind=Mesh" Name: "my-mesh", Namespace: "" for: "mesh.yaml": Internal error occurred: failed calling webhook "mmesh.appmesh.k8s.aws": failed to call webhook: Post " https://appmesh-controller-webhook-service.appmesh-system.svc:443/mutate-appmesh-k8s-aws-v1beta2-mesh?timeout=10s": dial tcp 10.191.197.188:9443: connect: connection refused

Regards, GauravS

On Wed, 4 Jan 2023 at 22:14, Joe Rice @.***> wrote:

Hi,

Have you tried setting the proxy configuration in your helm upgrade command? Here's an example:

helm upgrade -i appmesh-controller eks/appmesh-controller \ --namespace appmesh-system \ --set region=$AWS_REGION \ --set serviceAccount.create=false \ --set serviceAccount.name=appmesh-controller \ --set env.http_proxy=http://theproxyyouwant:port \ --set env.https_proxy=http://theproxyyouwant:port \ --set env.no_proxy='localhost\,127.0.0.1\,.cluster.local\,10.0.0.0/8'

That should set up the correct ENV vars in the App Mesh controller pod. You can verify the env is configured properly by doing a describe on the controller deployment: kubectl -n appmesh-system describe deployment appmesh-controller

— Reply to this email directly, view it on GitHub https://github.com/aws/eks-charts/issues/867#issuecomment-1371426231, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQ7IKNAUS5IYJXCEW5VL6ELWQXRZ3ANCNFSM6AAAAAATQ7U7OI . You are receiving this because you authored the thread.Message ID: @.***>

--

Thanks & Regards, Gaurav Suryawanshi Germany : +49 15145 926292 India : +91 8149 328 706

joesbigidea commented 1 year ago

It looks like the controller cannot startup because it can't reach the K8s API server. Since this occurs when you apply your proxy settings the most likely cause is that requests to your API server are being directed to your proxy. The env.no_proxy parameter you specified in your helm upgrade needs to cover any internal resources that shouldn't be accessed through your proxy.

The example I gave, --set env.no_proxy='localhost\,127.0.0.1\,.cluster.local\,10.0.0.0/8', likely needs to be updated to include your API server.

I'd recommend adjusting that no_proxy setting until the controller is starting up correctly (check the logs to verify). Deploying things like meshes definitely won't work until the controller is running correctly.

Also I noticed that you're including the proxy setting in the mesh.yaml file you're deploying. The mesh doesn't control those settings and doesn't manage the environment in any pods so you'll want to remove that env section.

gauravsuryawanshi0806 commented 1 year ago

Hi ,

Thanks for update, We are able to move ahead from Timeout, now we are getting"msg":"Failed to get API Group-Resources","error":"Unauthorized","stacktrace":"sigs.k8s.io/controlle

so can you tell us for service account appmesh-controller which role should be provided to access the api resources.

Logs :

{"level":"info","ts":1673252751.4426079,"logger":"setup","msg":"version","GitVersion":"v1.10.0-dirty","GitCommit":"9ac1931d92f002249391c6ffc4da3a776244d090","BuildDate":"2023-01-04T18:58:56+0000"} {"level":"info","ts":1673252751.4427762,"logger":"setup","msg":"Health endpoint","HealthProbeBindAddress":":61779"} {"level":"error","ts":1673252751.4767046,"logger":"controller-runtime.manager","msg":"Failed to get API Group-Resources","error":"Unauthorized","stacktrace":"sigs.k8s.io/controller-runtime/pkg/manager.New\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/manager/manager.go:312\nmain.main\n\t/workspace/main.go:150\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255"} {"level":"error","ts":1673252751.476787,"logger":"setup","msg":"unable to start app mesh controller","error":"Unauthorized","stacktrace":"main.main\n\t/workspace/main.go:172\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255"}

joesbigidea commented 1 year ago

Taking a step back, when we started here you were getting "Warning ReconcileError 40m Mesh RequestError: send request failed", which occurs after a successful startup, while the error you're receiving now is happening on startup, trying to get API group resources".

Unless you've made some other fundamental changes I suspect your proxy settings are causing this. A good first test of that would be to remove your proxy settings and see if the controller starts up correctly, and then starts getting those ReconcileErrrors again.

My guess is that the calls to the API server are still getting sent to your proxy. If you stop getting the Unauthorized error after removing your proxy settings then that's pretty good evidence of that. If that's the case then you probably need to adjust your no_proxy settings again. You can use kubectl -n default get svc to see what IP your API server is on and make sure calls to it won't be sent to the proxy.