DataDog / chaos-controller

:monkey: :fire: Datadog Failure Injection System for Kubernetes
Apache License 2.0
174 stars 27 forks source link

User Request: The ability to control if the DNS lookup is performed at the pod or node. #882

Open expFlower opened 1 month ago

expFlower commented 1 month ago

Is your feature request related to a problem? Please describe. When executing a host based network disruption the address resolved doens't provide the the result we need for the traffic to be successfully disrupted.

The reason is that on some of our services we are using ISTIO DNS proxying and more specifically. https://istio.io/latest/docs/ops/configuration/traffic-management/dns-proxy/#external-tcp-services-without-vips and https://istio.io/latest/blog/2020/dns-proxy/#automatic-vip-allocation-where-possible

What we see is that when using the POD nameserver to resolve the IPs we end up with the Class E subnet (240.xxx.xxx.xxx) as the resolved address. This is then used for the traffic disruption but doesn't impact the traffic as expected.(No disruption at all)

However if we don't use the POD nameserver and fallback to use the NODE nameserver to get the IPs for the host, then build a spec using those IPs (rather than the host) then the traffic is disrupted as expected.

As you can see from the below, different results depending on how the lookup is performed (address and ips altered)

[root@ip-10-1-1-1/]# nsenter -t 2903880 -n dig @172.20.0.10 exteranlservice.service.us-west-2.mycompany.test.net
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.amzn2.13.6 <<>> @172.20.0.10 exteranlservice.service.us-west-2.mycompany.test.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; >>HEADER<< opcode: QUERY, status: NOERROR, id: 42257
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;exteranlservice.service.us-west-2.mycompany.test.net. IN A
;; ANSWER SECTION:
exteranlservice.service.us-west-2.mycompany.test.net. 30 IN A 240.240.196.125
;; Query time: 0 msec
;; SERVER: 172.20.0.10#53(172.20.0.10)
;; WHEN: Mon Jul 15 10:07:45 UTC 2024
;; MSG SIZE  rcvd: 182

[root@ip-10-1-1-1 /]# nsenter -t 2903880 -n dig exteranlservice.service.us-west-2.mycompany.test.net
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.amzn2.13.6 <<>> exteranlservice.service.us-west-2.mycompany.test.net
;; global options: +cmd
;; Got answer:
;; >>HEADER<< opcode: QUERY, status: NOERROR, id: 19732
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;exteranlservice.service.us-west-2.mycompany.test.net. IN A
;; ANSWER SECTION:
exteranlservice.service.us-west-2.mycompany.test.net. 2 IN A 10.218.218.211
exteranlservice.service.us-west-2.mycompany.test.net. 2 IN A 10.218.218.212
exteranlservice.service.us-west-2.mycompany.test.net. 2 IN A 10.218.218.213
;; Query time: 0 msec
;; SERVER: 10.123.123.1#53(10.123.123.1)
;; WHEN: Mon Jul 15 10:06:28 UTC 2024
;; MSG SIZE  rcvd: 151

Describe the solution you'd like The ability to control if the host resolution is resolved by the pod or the node on a per disruption basis. The ability to set the default behaviour through the controller configuration, i.e default to POD and override to node on a per disruptions basis and vice versa

Additional context In our environment the DNS proxying described above is only applied on a per namespace and host basis, therefore configuration on a per disruption basis would be the most desirable solution.

Devatoria commented 1 month ago

The current behavior of the injector pod is to read both /etc/resolv.conf files from within the pod and from the node (here), the pod one taking precedence over the node configuration. Ideally, it is where we want to add a condition to pick one over the other.

wdyt @ptnapoleon?

ptnapoleon commented 1 month ago

Missed the github notification, sorry! This is a very reasonable request, but with August holidays, I don't think we'll be adding it until mid-September at the earliest. We'd accept PRs, of course