Virtual Node DNS service discovery is not working as expected

PeterK96 commented 1 year ago

Describe the bug

I have a backend and frontend Deployment with service discovery set to DNS (dns name of the backend Service) for the backend Virtual Node. My backend Deployment has more than two Pods, but if I send GET requests from the frontend to the backend I can only receive responses from two backend Pods. (even though I have more than two backend Pods.)

Steps to reproduce

I started off with the following example: howto-k8s-mtls-file-based. I changed the colorapp in such a way, that it returns the unique pod name and not its color (for testing where the response is coming from). I also modified the blue Deployment to have more than two replicas. After deploying the example and curling the blue backend from within the frontend pod:

curl -H "color_header: blue" color.howto-k8s-mtls-file-based.svc.cluster.local:8080/; echo;

I can only get responses from maximal two blue backend pods even if the blue Deployment has more than two replicas.

Expected outcome

I would expect the backend Kubernetes Service to take over the load balancing entirely and route traffic to all of my backend pods, so if I send GET requests from the frontend to the backend, I can reach every backend Pods.

Environment

App Mesh controller version v1.10.0
Envoy version v1.25.1.0-prod
Kubernetes version 1.24
Using EKS (yes/no), if so version? yes

BennettJames commented 1 year ago

Hey Peter,

What call volume are you sending here? Is it just the single curl request loop? If so this might just be a result of connection re-use. With default configurations, app mesh will re-use recent connections between services to speed up requests.

PeterK96 commented 1 year ago

Hello James,

I am sending curl loops containing 100 GET requests.

It could be connection re-use, what I cannot explain is why I can reach two upstream pods. You also mention that App Mesh will re-use recent connections between services by the default configuration.

Can you pinpoint me to this part of the documentation and how to overwrite it?

I also tried out something different:

I changed the K8s Service configuration to headless service by setting the clusterIP to None

apiVersion: v1
kind: Service
metadata:
name: color-blue
namespace: ${APP_NAMESPACE}
spec:
clusterIP: None
ports:
- port: 8080
  name: http
selector:
app: color
version: blue

Changed the service discovery to responsetype ENDPOINTS:

serviceDiscovery:
dns:
hostname: color-blue.${APP_NAMESPACE}.svc.cluster.local
responseType: ENDPOINTS

With this setup I can reach all the backend pods (not only two), the problem is that I loose the ability to use the K8s Service for load balancing, especially the session affinity feature, that I will need in the future.

aws / aws-app-mesh-controller-for-k8s

Virtual Node DNS service discovery is not working as expected #691