aws / aws-for-fluent-bit

The source of the amazon/aws-for-fluent-bit container image
Apache License 2.0
457 stars 135 forks source link

fluentbit causes to many nxdomain to coredns in EKS #704

Open nessa829 opened 1 year ago

nessa829 commented 1 year ago
### Describe the question/issue Hello, I recently migrated EKS from version 1.22 to version 1.27, in preparation for the recent version deprecation. However, after the migration, the load of coredns seemed to increase, so we proceeded with the analysis. The phenomenon was as follows. (refer to attached) - The coredns request itself rises (from 300M to a maximum of 530M) - coredns nxdomain error rises (from almost 0 to 400M) Upon further analysis, there was a change log from the EKS 1.25 version, https://docs.aws.amazon.com/ko_kr/eks/latest/userguide/kubernetes-versions.html#kubernetes-1.25 ``` The support for wildcard queries was deprecated in CoreDNS 1.8.7 and removed in CoreDNS 1.9. This was done as a security measure. Wildcard queries no longer work and return NXDOMAIN instead of an IP address. ``` even if nxdomain is considered a normal process to find the actual domain, it seems to be too much pressure to coredns. Also, If you look at fluentbit's dns query log in the attached capture, fluentbit's query continued to use wildcards, but only kubernetes.default.svc.cluster.local succeeded. It seems that too many unnecessary queries are being made. ![스크린샷 2023-07-11 오후 5 52 07](https://github.com/aws/aws-for-fluent-bit/assets/105766217/ea56ef6b-80cf-4d00-b6be-9702d505821e) how can I call only the necessary domains without using wildcards? below is fluent-bit pod's resolv.conf ``` bash-4.2# cat /etc/resolv.conf search amazon-cloudwatch.svc.cluster.local svc.cluster.local cluster.local ap-northeast-2.compute.internal nameserver 172.20.0.10 options ndots:5 ``` ### Configuration

Fluent Bit Log Output

Fluent Bit Version Info

Cluster Details

Application Details

Steps to reproduce issue

Related Issues

PettitWesley commented 1 year ago

Do you see these messages in your output? https://github.com/fluent/fluent-bit/blob/master/plugins/filter_kubernetes/kube_meta.c#L1386

AFAICT, Fluent Bit kubernetes filter only makes requests to a single hostname, which should be kubernetes.default.svc or whatever you configure. I do not see any code that I think would make a request with *.

https://github.com/fluent/fluent-bit/blob/master/plugins/filter_kubernetes/kube_meta.c#L1419

Also, If you look at fluentbit's dns query log in the attached capture, fluentbit's query continued to use wildcards, but only kubernetes.default.svc.cluster.local succeeded. It seems that too many unnecessary queries are being made.

Apologies, I need more help understanding here. How did you obtain the logs for DNS requests made by Fluent Bit?

PettitWesley commented 1 year ago

I suspect I'm probably missing something here; I apologize, if you can help me understand better that will help.