Feature request: Add support for Kubernetes endpoints

audip commented 6 years ago

The way fluentd is run in a kubernetes cluster is in the form of daemonset, which is a container (pod) that runs on each node of the cluster and ships logs to outputs (such as elasticsearch). As the fluentd pods can expose the metrics port e.g. 24220, fluentd-exporter should request endpoints from the kubernetes API and should scrape all the endpoints and make them available for Prometheus to scrape. fluentd-exporter in that case will be run as a single Deployment that polls from all fluentd in cluster.

As fluentd is part of CNCF (Cloud Native Computing Foundation), fluentd-exporter will benefit from kubernetes support, adding the functionality to scrape all fluentd pods and exposing them for prometheus to scrape.

The elasticsearch exporter supports kubernetes clusters: https://github.com/justwatchcom/elasticsearch_exporter

Let me know if feature request is in scope of this project. I'm no expert on Go and this will require work on my end, so I want to know if you can make the changes needed to support kubernetes.

V3ckt0r commented 6 years ago

Hey @audip,

Yea we can look at something like this. However we'll need to have a think about how we implement this. Are you thinking of using Annotations attached to Fluentd daemonsets, and using the K8s api to do the discovery?

audip commented 6 years ago

Hello @V3ckt0r , I'm not sure if we need to add the annotations. The way I am thinking of doing this is to request endpoints from kubernetes API server for thefluentd service, which will return a list of <IP:port> combinations that fluentd-exporter can directly scrape from, an example from the tool:

$ kubectl get endpoints fluentd --namespace kube-system
NAME      ENDPOINTS                                                                     AGE
fluentd   100.103.189.14:24220,100.103.36.14:24220,100.107.123.136:24220 + 12 more...   4d

Let me know what your thoughts on this are and we can talk through the design.

V3ckt0r commented 6 years ago

Hey @audip,

Sorry for the delay. Although this method would work, I don't think this is the best way to do it. The reason why I think this is because service definitions are not a constant, people can create services with any name they want. In the example above we have an nice easy name of fluentd but people out there can have services for the same thing and call it whatever they want. Doing discovery in this fashion would make it problematic going forwards.

I am thinking of simply using labels or annotations to do this sort of discovery, as these are methods Kubernetes give use for exactly that reason. The idea being if you want the exporter to auto detect scrape services, place a label on that service something like "app: fluentd", and all the code needs to do is scan services with that label and read out the IP:port.

We also need to think about how we go about telling fluentd_exporter it is running in Kubernetes so it knows it needs to do the discovery. I'm thinking simply adding a flag entry something like --kubernetes=<-Boolean-> default being False.

What do you think?

audip commented 6 years ago

Hello @V3ckt0r , I agree with your idea of using labels or annotations, annotations perfectly suited for this use-case. Having a --kubernetes flag that one can flip for a kubernetes deployment is nice (there might be alternatives too. I'd ask @rocktavious if he has any thoughts on this approach.

V3ckt0r commented 6 years ago

Hey @audip,

I've started work on this in the kubernetes branch here https://github.com/V3ckt0r/fluentd_exporter/tree/feature/kubernetes.

As we discussed, the code I added is looking for fluentd services across all namespaces in a Kubernetes deployment. It identifies these services by those that have a app:fluentd tag against it. Once identified it will take the ClusterIP and Port of that service and hit http://<ClusterIP>:<Port>/api/plugins.json

It's not finished, but there is a fair bit of code that can be reviewed. What do you think?

audip commented 6 years ago

Will spend some time reviewing it later this week

audip commented 6 years ago

Looking at the branch, the logic seems correct. Can you add an example of how to use this on a kubernetes cluster? Then I can test it on a k8s cluster with fluentd, fluentd-exporter and report back results. Also, the missing piece is RBAC permissions for fluentd-exporter to list services across namespaces, but I can take care of that bit and contribute back.

V3ckt0r commented 6 years ago

hey @audip,

Sorry for going silent. To update you on the latest with this, I've been doing testing with this using fluentd found at repo https://github.com/fluent/fluentd-kubernetes-daemonset apline-s3 build. There is an PR I've filed against this project at https://github.com/fluent/fluentd-kubernetes-daemonset/pull/110 as I don't like the way the image relies on EC2 IAM credentials for S3, for reasons given in the PR. That aside I'm starting to think there is another approach we should take that doesn't rely on setting up a service or endpoint. See below

$ kubectl get po -owide
NAME                                           READY     STATUS    RESTARTS   AGE       IP             NODE
fluentd-dt7zm                                  1/1       Running   0          10d       100.96.10.35   ip-xxx.compute.internal
fluentd-pf2j6                                  1/1       Running   0          10d       100.96.3.20    ip-xxx.compute.internal
fluentd-zz7ck                                  1/1       Running   0          10d       100.96.11.34   ip-xxx.compute.internal

Each pod automatically gets assigned a cluster ip by default. I'm thinking of changing the code to look for and pick this cluster IP up instead of the service IP. Going to test this out and report back. I'll send over the yaml you can use in your own cluster to test once I've done that.

Cheers.

glend commented 6 years ago

Can't you just add a fluentd exporter sidecar in the daemonset? I did this and then added the prometheus annotations and it's scraping fine.

In case of Elasticsearch it's different because you need to hit the API of a clustered Elasticsearch.

audip commented 6 years ago

@glend Yes, that is another way of doing it. Can you share the manifest YAML that you set up for prometheus scraping?

V3ckt0r / fluentd_exporter

Feature request: Add support for Kubernetes endpoints #4