Closed jaxxstorm closed 11 months ago
Hi @jaxxstorm
Thanks for your report. Can you share more information about no such host
? What commands, if any, did you run? Are there any relevant log files you can also share?
I see you mentioned that you changed the k8s namespace to not match the cloud map namespace. Did you also change color.howto-k8s-grpc.svc.cluster.local
VirtualService name? One limitation of App Mesh is that it does not provide a DNS resolver, so DNS lookups to VirtualServices need to have a valid DNS response - even if its just a dummy response.
More details on the App Mesh docs
Thanks for your report. Can you share more information about no such host? What commands, if any, did you run? Are there any relevant log files you can also share?
This is coming from the color client from this example: https://github.com/aws/aws-app-mesh-examples/tree/main/walkthroughs/howto-k8s-grpc
I see you mentioned that you changed the k8s namespace to not match the cloud map namespace. Did you also change color.howto-k8s-grpc.svc.cluster.local VirtualService name? One limitation of App Mesh is that it does not provide a DNS resolver, so DNS lookups to VirtualServices need to have a valid DNS response - even if its just a dummy response.
Yes, the service name has been changed
{
"virtualServices": [
{
"arn": <redacted>
"createdAt": "2023-12-07T11:43:31.260000-08:00",
"lastUpdatedAt": "2023-12-07T11:43:31.260000-08:00",
"meshName": "<redacted>",
"meshOwner": "<redacted>",
"resourceOwner": "<redacted>",
"version": 1,
"virtualServiceName": "color.howto-k8s-grpc.svc.cluster.local"
}
]
}
The color client deployment from within the cluster is unable to resolve anything in the mesh
@jaxxstorm What was the before and after names for the virtual service?
Can you try changing the virtual service name to color.grpc.svc.cluster.local
instead of color.howto-k8s-grpc.svc.cluster.local
? (and update the client accordingly)
Verified with k8s DNS service to resolve the ip address of color service within the cluster worked as @bendu clarified. Here is the manifest
---
apiVersion: v1
kind: Namespace
metadata:
name: grpc
labels:
mesh: grpc
appmesh.k8s.aws/sidecarInjectorWebhook: enabled
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: Mesh
metadata:
name: grpc
spec:
namespaceSelector:
matchLabels:
mesh: grpc
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualNode
metadata:
name: client
namespace: grpc
spec:
podSelector:
matchLabels:
app: client
listeners:
- portMapping:
port: 8080
protocol: http
backends:
- virtualService:
virtualServiceRef:
name: color
serviceDiscovery:
awsCloudMap:
namespaceName: howto-k8s-grpc.svc.cluster.local
serviceName: client
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualNode
metadata:
name: server
namespace: grpc
spec:
podSelector:
matchLabels:
app: color
version: server
listeners:
- portMapping:
port: 8080
protocol: grpc
healthCheck:
port: 8080
protocol: grpc
healthyThreshold: 2
unhealthyThreshold: 3
timeoutMillis: 2000
intervalMillis: 5000
serviceDiscovery:
awsCloudMap:
namespaceName: howto-k8s-grpc.svc.cluster.local
serviceName: color
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualService
metadata:
name: color
namespace: grpc
spec:
awsName: color.grpc.svc.cluster.local
provider:
virtualRouter:
virtualRouterRef:
name: color
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualRouter
metadata:
name: color
namespace: grpc
spec:
listeners:
- portMapping:
port: 8080
protocol: grpc
routes:
- name: route
grpcRoute:
match:
serviceName: color.ColorService
methodName: GetColor
action:
weightedTargets:
- virtualNodeRef:
name: server
weight: 1
---
# Service per VirtualNode is a no-op when using CloudMap
apiVersion: v1
kind: Service
metadata:
name: client
namespace: grpc
spec:
ports:
- port: 8080
name: http
selector:
app: client
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: client
namespace: grpc
spec:
replicas: 1
selector:
matchLabels:
app: client
template:
metadata:
labels:
app: client
spec:
containers:
- name: app
image: 653561076409.dkr.ecr.us-west-2.amazonaws.com/howto-k8s-grpc/color_client
ports:
- containerPort: 8080
env:
- name: "COLOR_HOST"
value: "color.grpc.svc.cluster.local:8080"
- name: "PORT"
value: "8080"
---
# Service per VirtualNode is a no-op when using CloudMap
apiVersion: v1
kind: Service
metadata:
name: server
namespace: grpc
spec:
ports:
- port: 8080
name: http
selector:
app: color
version: server
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: server
namespace: grpc
spec:
replicas: 1
selector:
matchLabels:
app: color
version: server
template:
metadata:
labels:
app: color
version: server
spec:
containers:
- name: app
image: 653561076409.dkr.ecr.us-west-2.amazonaws.com/howto-k8s-grpc/color_server
ports:
- containerPort: 8080
env:
- name: "COLOR"
value: "no color!"
- name: "PORT"
value: "8080"
---
apiVersion: v1
kind: Service
metadata:
name: color
namespace: grpc
spec:
ports:
- port: 8080
name: http
selector:
app: color
Instead of obfuscating the issue, I've gone ahead and put together the actual deployment. It's very possible this is a misconfiguration.
This is the configuration I have:
https://gist.github.com/jaxxstorm/e91ea2f1ad0aa311687ec23ade8ec2e3
This is the cloudmap setting
{
"Namespaces": [
{
"Id": "ns-qcugxy37n65wzlmn",
"Arn": "arn:aws:servicediscovery:us-east-1:186241287477:namespace/ns-qcugxy37n65wzlmn",
"Name": "development.svc.cluster.local",
"Type": "DNS_PRIVATE",
"Description": "CloudMap Namespace for IgnisTech EKS Cluster",
"Properties": {
"DnsProperties": {
"HostedZoneId": "Z06030562RRA6FBCEXXFJ",
"SOA": {
"TTL": 15
}
},
"HttpProperties": {
"HttpName": "development.svc.cluster.local"
}
},
"CreateDate": "2023-12-06T15:11:24.723000-08:00"
}
]
}
The services are successfully registered:
{
"Services": [
{
"Id": "srv-dl6stda67hnppewu",
"Arn": "arn:aws:servicediscovery:us-east-1:186241287477:service/srv-dl6stda67hnppewu",
"Name": "server",
"Type": "DNS_HTTP",
"DnsConfig": {
"RoutingPolicy": "MULTIVALUE",
"DnsRecords": [
{
"Type": "A",
"TTL": 300
}
]
},
"CreateDate": "2023-12-09T09:19:35.937000-08:00"
},
{
"Id": "srv-eqlmb5kwb74tkhzi",
"Arn": "arn:aws:servicediscovery:us-east-1:186241287477:service/srv-eqlmb5kwb74tkhzi",
"Name": "client",
"Type": "DNS_HTTP",
"DnsConfig": {
"RoutingPolicy": "MULTIVALUE",
"DnsRecords": [
{
"Type": "A",
"TTL": 300
}
]
},
"CreateDate": "2023-12-09T09:19:36.236000-08:00"
}
]
}
@ysdongAmazon I have a similar set up working when not using cloud map at all. I guess the issue here is that the cloudmap services aren't being resolved. Is there something I'm missing to get that part working, so I don't need to use k8s service discovery?
In the same k8s cluster, k8s internal DNS plugin could resolve the ip address directly, so as for your manifest, it may be worth checking if your k8s service name is the same as app mesh virtual service name. Here is the way I check the connection with a temp container:
dev-dsk-ysdong-2b-61057cc3 % kubectl run -i --tty --rm debug --image=busybox
If you don't see a command prompt, try pressing enter.
/ #
/ #
/ # nslookup color.howto-k8s-grpc.svc.cluster.local
Server: 10.100.0.10
Address: 10.100.0.10:53
** server can't find color.howto-k8s-grpc.svc.cluster.local: NXDOMAIN
** server can't find color.howto-k8s-grpc.svc.cluster.local: NXDOMAIN
/ # nslookup color.grpc.svc.cluster.local
Server: 10.100.0.10
Address: 10.100.0.10:53
Name: color.grpc.svc.cluster.local
Address: 10.100.41.246
I can definitely get this working if I lookup the service name via the Kubernetes lookup mechanism. In fact, that works without appmesh deployed at all. What i'm confused about here is that I'm registering the service in cloudmap, and I'm using appmesh to try and intercept calls, if I just do a DNS lookup directly for the service name, why do I need appmesh or cloudmap at all?
I was under the impression I could register the service in appmesh/cloudmap and it would resolve the correct IP (registered in cloudmap) - and that envoy would intercept that request to handle it for me. Is that not how this works?
Unfortunately, that's not how App Mesh works.
App Mesh does not provide a DNS responder, so it is only able to intercept calls after DNS lookup is complete, at which point the original DNS lookup is ignored and the destination from App Mesh is used instead.
A rough view of traffic flow from a client with the mesh looks like this: Client makes DNS request -> existing DNS responder responds with IP (the IP doesn't need to be valid, but it needs to respond successfully) -> Client makes call to IP -> OS (via iptables) captures request and sends it to Envoy instead of the IP -> Envoy uses App Mesh provided config manifest to route request to destination.
So, there are a couple of solutions to your problem:
Okay, that makes perfect sense now. Thanks for all your help!
Describe the bug
I have AWS AppMesh configured and have registered all the required nodes, services and routes. However, when I try to run the color example for gRPC, I get
no such host
Steps to reproduce
Here's the example manifest:
Note: I intentionally changed the name of the k8s namespace so that it doesn't match the cloudmap namespace. Everything works as expected if I make the namespace match the cloudmap namespace.
Expected outcome
I want a global mesh and service discovery namespace that works across all EKS namespaces
Environment