aws / aws-app-mesh-controller-for-k8s

A controller to help manage App Mesh resources for a Kubernetes cluster.
Apache License 2.0
187 stars 109 forks source link

Does the cloudmap namespace have to match the k8s namespace? #752

Closed jaxxstorm closed 9 months ago

jaxxstorm commented 9 months ago

Describe the bug

I have AWS AppMesh configured and have registered all the required nodes, services and routes. However, when I try to run the color example for gRPC, I get no such host

Steps to reproduce

Here's the example manifest:

---
apiVersion: v1
kind: Namespace
metadata:
  name: grpc
  labels:
    mesh: grpc
    appmesh.k8s.aws/sidecarInjectorWebhook: enabled
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: Mesh
metadata:
  name: grpc
spec:
  namespaceSelector:
    matchLabels:
      mesh: grpc
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualNode
metadata:
  name: client
  namespace: grpc
spec:
  podSelector:
    matchLabels:
      app: client
  listeners:
    - portMapping:
        port: 8080
        protocol: http
  backends:
    - virtualService:
        virtualServiceRef:
          name: color
  serviceDiscovery:
    awsCloudMap:
      namespaceName: howto-k8s-grpc.svc.cluster.local
      serviceName: client
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualNode
metadata:
  name: server
  namespace: grpc
spec:
  podSelector:
    matchLabels:
      app: color
      version: server
  listeners:
    - portMapping:
        port: 8080
        protocol: grpc
      healthCheck:
        port: 8080
        protocol: grpc
        healthyThreshold: 2
        unhealthyThreshold: 3
        timeoutMillis: 2000
        intervalMillis: 5000
  serviceDiscovery:
    awsCloudMap:
      namespaceName: howto-k8s-grpc.svc.cluster.local
      serviceName: color
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualService
metadata:
  name: color
  namespace: grpc
spec:
  awsName: color.howto-k8s-grpc.svc.cluster.local
  provider:
    virtualRouter:
      virtualRouterRef:
        name: color
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualRouter
metadata:
  name: color
  namespace: grpc
spec:
  listeners:
    - portMapping:
        port: 8080
        protocol: grpc
  routes:
    - name: route
      grpcRoute:
        match:
          serviceName: color.ColorService
          methodName: GetColor
        action:
          weightedTargets:
            - virtualNodeRef:
                name: server
              weight: 1
---
# Service per VirtualNode is a no-op when using CloudMap
apiVersion: v1
kind: Service
metadata:
  name: client
  namespace: grpc
spec:
  ports:
    - port: 8080
      name: http
  selector:
    app: client
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: client
  namespace: grpc
spec:
  replicas: 1
  selector:
    matchLabels:
      app: client
  template:
    metadata:
      labels:
        app: client
    spec:
      containers:
        - name: app
          image: 186241287477.dkr.ecr.us-east-1.amazonaws.com/howto-k8s-grpc/color_client
          ports:
            - containerPort: 8080
          env:
            - name: "COLOR_HOST"
              value: "color.howto-k8s-grpc.svc.cluster.local:8080"
            - name: "PORT"
              value: "8080"
---
# Service per VirtualNode is a no-op when using CloudMap
apiVersion: v1
kind: Service
metadata:
  name: server
  namespace: grpc
spec:
  ports:
    - port: 8080
      name: http
  selector:
    app: color
    version: server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: server
  namespace: grpc
spec:
  replicas: 1
  selector:
    matchLabels:
      app: color
      version: server
  template:
    metadata:
      labels:
        app: color
        version: server
    spec:
      containers:
        - name: app
          image: 186241287477.dkr.ecr.us-east-1.amazonaws.com/howto-k8s-grpc/color_server
          ports:
            - containerPort: 8080
          env:
            - name: "COLOR"
              value: "no color!"
            - name: "PORT"
              value: "8080"
---
apiVersion: v1
kind: Service
metadata:
  name: color
  namespace: grpc
spec:
  ports:
    - port: 8080
      name: http
  selector:
    app: color

Note: I intentionally changed the name of the k8s namespace so that it doesn't match the cloudmap namespace. Everything works as expected if I make the namespace match the cloudmap namespace.

{
    "Namespaces": [
        {
            "Id": "ns-qn2dvmv2kdquflmy",
            "Arn": "arn:aws:servicediscovery:us-east-1:186241287477:namespace/ns-qn2dvmv2kdquflmy",
            "Name": "howto-k8s-grpc.svc.cluster.local",
            "Type": "DNS_PRIVATE",
            "Properties": {
                "DnsProperties": {
                    "HostedZoneId": "Z0437060K1QTTZHD7TQY",
                    "SOA": {
                        "TTL": 15
                    }
                },
                "HttpProperties": {
                    "HttpName": "howto-k8s-grpc.svc.cluster.local"
                }
            },
            "CreateDate": "2023-12-07T11:02:47.981000-08:00"
        }
    ]
}

Expected outcome

I want a global mesh and service discovery namespace that works across all EKS namespaces

Environment

bendu commented 9 months ago

Hi @jaxxstorm

Thanks for your report. Can you share more information about no such host? What commands, if any, did you run? Are there any relevant log files you can also share?

I see you mentioned that you changed the k8s namespace to not match the cloud map namespace. Did you also change color.howto-k8s-grpc.svc.cluster.local VirtualService name? One limitation of App Mesh is that it does not provide a DNS resolver, so DNS lookups to VirtualServices need to have a valid DNS response - even if its just a dummy response.

More details on the App Mesh docs

jaxxstorm commented 9 months ago

Thanks for your report. Can you share more information about no such host? What commands, if any, did you run? Are there any relevant log files you can also share?

This is coming from the color client from this example: https://github.com/aws/aws-app-mesh-examples/tree/main/walkthroughs/howto-k8s-grpc

I see you mentioned that you changed the k8s namespace to not match the cloud map namespace. Did you also change color.howto-k8s-grpc.svc.cluster.local VirtualService name? One limitation of App Mesh is that it does not provide a DNS resolver, so DNS lookups to VirtualServices need to have a valid DNS response - even if its just a dummy response.

Yes, the service name has been changed

{
    "virtualServices": [
        {
            "arn": <redacted>
            "createdAt": "2023-12-07T11:43:31.260000-08:00",
            "lastUpdatedAt": "2023-12-07T11:43:31.260000-08:00",
            "meshName": "<redacted>",
            "meshOwner": "<redacted>",
            "resourceOwner": "<redacted>",
            "version": 1,
            "virtualServiceName": "color.howto-k8s-grpc.svc.cluster.local"
        }
    ]
}

The color client deployment from within the cluster is unable to resolve anything in the mesh

bendu commented 9 months ago

@jaxxstorm What was the before and after names for the virtual service?

Can you try changing the virtual service name to color.grpc.svc.cluster.local instead of color.howto-k8s-grpc.svc.cluster.local? (and update the client accordingly)

ysdongAmazon commented 9 months ago

Verified with k8s DNS service to resolve the ip address of color service within the cluster worked as @bendu clarified. Here is the manifest

---
apiVersion: v1
kind: Namespace
metadata:
  name: grpc
  labels:
    mesh: grpc
    appmesh.k8s.aws/sidecarInjectorWebhook: enabled
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: Mesh
metadata:
  name: grpc
spec:
  namespaceSelector:
    matchLabels:
      mesh: grpc
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualNode
metadata:
  name: client
  namespace: grpc
spec:
  podSelector:
    matchLabels:
      app: client
  listeners:
    - portMapping:
        port: 8080
        protocol: http
  backends:
    - virtualService:
        virtualServiceRef:
          name: color
  serviceDiscovery:
    awsCloudMap:
      namespaceName: howto-k8s-grpc.svc.cluster.local
      serviceName: client
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualNode
metadata:
  name: server
  namespace: grpc
spec:
  podSelector:
    matchLabels:
      app: color
      version: server
  listeners:
    - portMapping:
        port: 8080
        protocol: grpc
      healthCheck:
        port: 8080
        protocol: grpc
        healthyThreshold: 2
        unhealthyThreshold: 3
        timeoutMillis: 2000
        intervalMillis: 5000
  serviceDiscovery:
    awsCloudMap:
      namespaceName: howto-k8s-grpc.svc.cluster.local
      serviceName: color
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualService
metadata:
  name: color
  namespace: grpc
spec:
  awsName: color.grpc.svc.cluster.local
  provider:
    virtualRouter:
      virtualRouterRef:
        name: color
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualRouter
metadata:
  name: color
  namespace: grpc
spec:
  listeners:
    - portMapping:
        port: 8080
        protocol: grpc
  routes:
    - name: route
      grpcRoute:
        match:
          serviceName: color.ColorService
          methodName: GetColor
        action:
          weightedTargets:
            - virtualNodeRef:
                name: server
              weight: 1
---
# Service per VirtualNode is a no-op when using CloudMap
apiVersion: v1
kind: Service
metadata:
  name: client
  namespace: grpc
spec:
  ports:
    - port: 8080
      name: http
  selector:
    app: client
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: client
  namespace: grpc
spec:
  replicas: 1
  selector:
    matchLabels:
      app: client
  template:
    metadata:
      labels:
        app: client
    spec:
      containers:
        - name: app
          image: 653561076409.dkr.ecr.us-west-2.amazonaws.com/howto-k8s-grpc/color_client
          ports:
            - containerPort: 8080
          env:
            - name: "COLOR_HOST"
              value: "color.grpc.svc.cluster.local:8080"
            - name: "PORT"
              value: "8080"
---
# Service per VirtualNode is a no-op when using CloudMap
apiVersion: v1
kind: Service
metadata:
  name: server
  namespace: grpc
spec:
  ports:
    - port: 8080
      name: http
  selector:
    app: color
    version: server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: server
  namespace: grpc
spec:
  replicas: 1
  selector:
    matchLabels:
      app: color
      version: server
  template:
    metadata:
      labels:
        app: color
        version: server
    spec:
      containers:
        - name: app
          image: 653561076409.dkr.ecr.us-west-2.amazonaws.com/howto-k8s-grpc/color_server
          ports:
            - containerPort: 8080
          env:
            - name: "COLOR"
              value: "no color!"
            - name: "PORT"
              value: "8080"
---
apiVersion: v1
kind: Service
metadata:
  name: color
  namespace: grpc
spec:
  ports:
    - port: 8080
      name: http
  selector:
    app: color
jaxxstorm commented 9 months ago

Instead of obfuscating the issue, I've gone ahead and put together the actual deployment. It's very possible this is a misconfiguration.

This is the configuration I have:

https://gist.github.com/jaxxstorm/e91ea2f1ad0aa311687ec23ade8ec2e3

This is the cloudmap setting

{
    "Namespaces": [
        {
            "Id": "ns-qcugxy37n65wzlmn",
            "Arn": "arn:aws:servicediscovery:us-east-1:186241287477:namespace/ns-qcugxy37n65wzlmn",
            "Name": "development.svc.cluster.local",
            "Type": "DNS_PRIVATE",
            "Description": "CloudMap Namespace for IgnisTech EKS Cluster",
            "Properties": {
                "DnsProperties": {
                    "HostedZoneId": "Z06030562RRA6FBCEXXFJ",
                    "SOA": {
                        "TTL": 15
                    }
                },
                "HttpProperties": {
                    "HttpName": "development.svc.cluster.local"
                }
            },
            "CreateDate": "2023-12-06T15:11:24.723000-08:00"
        }
    ]
}

The services are successfully registered:

{
    "Services": [
        {
            "Id": "srv-dl6stda67hnppewu",
            "Arn": "arn:aws:servicediscovery:us-east-1:186241287477:service/srv-dl6stda67hnppewu",
            "Name": "server",
            "Type": "DNS_HTTP",
            "DnsConfig": {
                "RoutingPolicy": "MULTIVALUE",
                "DnsRecords": [
                    {
                        "Type": "A",
                        "TTL": 300
                    }
                ]
            },
            "CreateDate": "2023-12-09T09:19:35.937000-08:00"
        },
        {
            "Id": "srv-eqlmb5kwb74tkhzi",
            "Arn": "arn:aws:servicediscovery:us-east-1:186241287477:service/srv-eqlmb5kwb74tkhzi",
            "Name": "client",
            "Type": "DNS_HTTP",
            "DnsConfig": {
                "RoutingPolicy": "MULTIVALUE",
                "DnsRecords": [
                    {
                        "Type": "A",
                        "TTL": 300
                    }
                ]
            },
            "CreateDate": "2023-12-09T09:19:36.236000-08:00"
        }
    ]
}
jaxxstorm commented 9 months ago

@ysdongAmazon I have a similar set up working when not using cloud map at all. I guess the issue here is that the cloudmap services aren't being resolved. Is there something I'm missing to get that part working, so I don't need to use k8s service discovery?

ysdongAmazon commented 9 months ago

In the same k8s cluster, k8s internal DNS plugin could resolve the ip address directly, so as for your manifest, it may be worth checking if your k8s service name is the same as app mesh virtual service name. Here is the way I check the connection with a temp container:

dev-dsk-ysdong-2b-61057cc3 % kubectl run -i --tty --rm debug --image=busybox
If you don't see a command prompt, try pressing enter.
/ #
/ #
/ # nslookup color.howto-k8s-grpc.svc.cluster.local
Server:     10.100.0.10
Address:    10.100.0.10:53

** server can't find color.howto-k8s-grpc.svc.cluster.local: NXDOMAIN

** server can't find color.howto-k8s-grpc.svc.cluster.local: NXDOMAIN

/ # nslookup color.grpc.svc.cluster.local
Server:     10.100.0.10
Address:    10.100.0.10:53

Name:   color.grpc.svc.cluster.local
Address: 10.100.41.246
jaxxstorm commented 9 months ago

I can definitely get this working if I lookup the service name via the Kubernetes lookup mechanism. In fact, that works without appmesh deployed at all. What i'm confused about here is that I'm registering the service in cloudmap, and I'm using appmesh to try and intercept calls, if I just do a DNS lookup directly for the service name, why do I need appmesh or cloudmap at all?

I was under the impression I could register the service in appmesh/cloudmap and it would resolve the correct IP (registered in cloudmap) - and that envoy would intercept that request to handle it for me. Is that not how this works?

bendu commented 9 months ago

Unfortunately, that's not how App Mesh works.

App Mesh does not provide a DNS responder, so it is only able to intercept calls after DNS lookup is complete, at which point the original DNS lookup is ignored and the destination from App Mesh is used instead.

A rough view of traffic flow from a client with the mesh looks like this: Client makes DNS request -> existing DNS responder responds with IP (the IP doesn't need to be valid, but it needs to respond successfully) -> Client makes call to IP -> OS (via iptables) captures request and sends it to Envoy instead of the IP -> Envoy uses App Mesh provided config manifest to route request to destination.

So, there are a couple of solutions to your problem:

  1. Name your VirtualServices to match existing DNS names that would have a successful IP lookup
  2. Implement your own DNS responder. This can be anything from adding a value to the hosts file or using Route 53 DNS service to provide an IP (doesn't need to be a valid IP)

docs

jaxxstorm commented 9 months ago

Okay, that makes perfect sense now. Thanks for all your help!