elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.15k stars 4.91k forks source link

[Metricbeat] Add cluster identifier to Kubernetes metadata #17467

Closed sorantis closed 3 years ago

sorantis commented 4 years ago

Today there's no way in beats to distinguish between multiple Kubernetes clusters. We can monitor and visualize pods, containers, but there's no way to see the full picture:

Kubernetes cluster 1 -> Nodes -> Pods -> Containers Kubernetes cluster 2 -> Nodes -> Pods -> Containers

As a DevOps engineer I want to be able to easily identify which cluster my nodes and pods belong to so that I could have a holistic view of all my Kubernetes deployments irrespective of where they're deployed.

cc: @exekias, @jsoriano, @ChrsMark

elasticmachine commented 4 years ago

Pinging @elastic/integrations-platforms (Team:Platforms)

exekias commented 4 years ago

I totally agree we need a field like this. This has been lightly discussed in the past, the biggest issue is that Kubernetes clusters don't have a name you can access through the API (afaik, this may have changed). We need to figure out what would be a good alternative that provides a unique value per cluster that is recognizable by users.

One possible option would be to use the API Server URL. Do you folks know a better alternative? this may require some research

blakerouse commented 4 years ago

Yeah I do not believe kubernetes itself really knows. Maybe the cloud metadata processor could add it as it is specific per cloud:

eg. GKE curl http://metadata/computeMetadata/v1/instance/attributes/cluster-name -H "Metadata-Flavor: Google"

jsoriano commented 4 years ago

Sorry for the long comment, I wanted to provide some context on previous discussions and current options and it got a bit out of hand :grimacing:

I remember discussing about this in the past, also about other services. Similar problem appears with almost any other service (mysql, kafka...), you may have several clusters of the same service reporting to the same ES cluster. In most of the cases I would say that services don't have something like a cluster name.

Historically the only solution for that has been to use custom fields, that can be set at a global level or at a module level. After adding these fields you can take them into account by adding filters when querying the data or when visualizing it in Kibana. The good thing about that is that is quite straight-fordward and allows you to make any classification of your clusters you want (maybe you want to classify your clusters by datacenter or by staging/production apart of by name, or you may want to have the same classification for a webapp and its databases, or for a kubernetes cluster and its etcd cluster). The bad thing is that with this there is no previous agreement on the fields used and then we cannot provide features based on this on pre-packaged UIs or dashboards.

When migrating to ECS we thought about this problem and we added the service.name field, and a setting at the module level (also service.name) for it. The idea behind this label is that if you have for example multiple mysql clusters you can set service.type: mysql and service.name: whatever. Of course the previously mentioned fields setting can also be used to set this name, and also modules of services that have something like a cluster name can also set it. I think we are not leveraging this service.name field, but we could for services.

For kubernetes we would need something else in any case, because we may have a mysql cluster, deployed in a kubernetes cluster, so we cannot use the service fields both for mysql and for kubernetes. Same problem is going to appear with docker (swarm), PCF, nomad and so on. So I think that the solution we provide should be applicable for these cases too. We can go for specific fields (like kubernetes.cluster_name, nomad.cluster_name... or maybe we need to think in a common schema for these that we can add to ECS, e.g orchestrator.type and orchestrator.cluster_name, so for the events of a mysql cluster deployed in kubernetes we can set the fields service.type: mysql, service.name: whatever, orchestrator.type: kubernetes and orchestrator.cluster_name: dc1-production.

I totally agree we need a field like this. This has been lightly discussed in the past, the biggest issue is that Kubernetes clusters don't have a name you can access through the API (afaik, this may have changed). We need to figure out what would be a good alternative that provides a unique value per cluster that is recognizable by users.

I don't think clusters have a name by themselves, but I have just discovered that there is a cluster-info config map (kubectl get cm -n kube-public cluster-info -o yaml), that contains a list of clusters with a name field for each one, that is empty in my case. Not sure how this is populated, but in any case this name seems to be optional and quite hidden.

One possible option would be to use the API Server URL. Do you folks know a better alternative? this may require some research

From the point of view of beats deployed in Kubernetes the API server URL is always going to be the same, isn't it? Probably in managed services there is a way to query for the external URL, or the name given by the provider. But the way to query this is going to depend on the provider (unless they use this cluster-info config map :slightly_smiling_face: )

I think that whatever we do we are going to need to expose this through the config so users can set any value they want (as we do with custom fields or with service.name).

jsoriano commented 4 years ago

there is a cluster-info config map (kubectl get cm -n kube-public cluster-info -o yaml)

This config map is created by kubeadm, and it doesn't exist in GKE clusters, so I guess we cannot rely on it by now https://github.com/kubernetes/community/blob/master/contributors/design-proposals/cluster-lifecycle/bootstrap-discovery.md#new-kube-public-namespace

exekias commented 4 years ago

Thank you for all the background explanation @jsoriano! It's good to see that we can find a human readable name in some cases. WDYT about something like this?:

The UI must always use kubernetes.cluster.url to differenciate between clusters. This means, use it in aggregations and when filtering, while it can show kubernetes.cluster.name as a friendly name for usability.

jsoriano commented 4 years ago

Field names LGTM, but we would still have to see how to obtain a unique URL in the different deployment methods of Kubernetes. If this field is going to be the expected identifier in UI we should still provide a way to set it up manually for cases that are not automatically supported. We can always use fields, or processors to set these fields in the global config so users don't need to set it in all kubernetes configs.

Regarding other orchestrators, do you think we should find a common solution? or we decide on them when/if we have UIs or dashboards where we need a convention.

blakerouse commented 4 years ago

What about defining those fields in an ECS format so they are same no matter the orchestrator as @jsoriano mentioned above?

Kubernetes

Cloud Foundry

Nomad

exekias commented 4 years ago

that sounds like a good idea to me! I think that should be part of a broader conversation about how we want to expose the different orchestrators and if we can map their constructs into common fields

exekias commented 3 years ago

This came again today, I think that adding kubernetes.cluster.url and kubernetes.cluster.name for now should be a quick win, and names are easy enough for user to discover them.

Once we reach the orchestrator layer in the inventory schema discussions we can revisit how this should look for all of them.

zvazquez commented 3 years ago

Is there any update about this feature. We have the same scenarios where we have multiple Kubernetes clusters that we want to monitor centrally and having this metadata available will be very beneficial.

Thanks!

cyrille-leclerc commented 3 years ago

FYI a standardisation of orchestrator fields is in progress on ECS, see https://github.com/elastic/ecs/pull/1230

exekias commented 3 years ago

Now that we have this in ECS we plan to move ahead and add these fields, @ChrsMark I'm assigning you for now 🙏