Investigate if it is possible to set orchestrator fields from Cloud provider kubernetes metadata

tetianakravchenko commented 1 year ago

Azure

Describe the enhancement:

orchestrator.cluster.name and orchestrator.cluster.url will not be set when metricbeat is running on AKS.

as mentioned in https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-module-kubernetes.html#_dashboard_32:

This field gets its value from sources like kube_config, kubeadm-config configMap, and Google Cloud’s meta API for GKE.

this feature was introduced by https://github.com/elastic/beats/pull/26056

Similar to how we support GKE Metadata now, we should investigate if it is possible to get k8s cluster name and k8s cluster url to set orchestrator.cluster fields using the Azure kubernetes metadata.

AWS

EKS: initial investigation - https://github.com/elastic/beats/issues/30229#issuecomment-1047016271

Google cloud

we already provide setting orchestrator.cluster.* from GKE metadata

tetianakravchenko commented 1 year ago

Work on this issue might require an alignment on how to organize the code better: now we have GetKubernetesClusterIdentifier, that only check kubeconfig and kube-admin sources to set cluster name and url. At the same time cluster metadata for GKE is added in add_cloud_metadata processor. Cluster metadata should be set in the same place, that requires code re-organisation as we are planing to add support for more providers.

Based on that, we might reconsider work done in https://github.com/elastic/cloudbeat/pull/455, see the relevant disscussion https://github.com/elastic/cloudbeat/pull/455/files#r999273250

ofiriro3 commented 1 year ago

Hi,

My team @elastic/cloud-security-posture has just implemented a processor that uses the GetKubernetesClusterIdentifier function.

We would be happy if you can create another action item to revisit how we can organize the various implementation better.

cc @ChrsMark

ChrsMark commented 1 year ago

@tetianakravchenko fyi that in GKE autopilot the fields are not set.

gsantoro commented 1 year ago

@tetianakravchenko AKS is missing kubeconfig and it doesn't container the cluster name or the cluster url.

gsantoro commented 1 year ago

hey @ChrsMark and @tetianakravchenko, couple of ideas:

since 3/4 of our cloud environments (all except for GKE standard) don't provide a cluster.name/url in the metadata api...
...and we can either provide a custom solution in code for each provider (and that probably will take some time to implement)...
...or we have to write docs for each cloud environment for the user to fix this on their own in Kibana (at least for the meantime)....
...and the fix in Kibana is quite complicated since the user have to paste a few lines of processor yaml code into the Processor box in the right location (aka under Kube-state-metrics / Node Metrics UI)...
...and that is probably very error prone and I can already see the infinite list of SDH coming our way...
...and I am not a huge fun of docs if there is a better alternative...

... I was thinking

...if instead of providing docs for the customer to fix this issue on their...
...we could embed the "Processor" code already in the integration (in the right place) and expose an optional field to the User in the Kibana UI.

A couple of use case how I see this to play out:

if we can get this info from the metadata API (like for GKE standard), we can use that value and ignore any user input
otherwise (for the other 3/4 options), we ask the user provided us a cluster.name/url.
- if the user doesn't provide the cluster name, we might use a default value like "cluster-name" (or anything else). This way we still fix the dashboards and it works either way.
- if the user provide a cluster name, we use that value instead. We might even think of using this value anyway for GKE standard if we want to provide the user the ability to provide a more user friendly name for the cluster.

I don't mean this to be the final solution but it could be easily be a temporary fix until we find a better way to fix this in code.

What do you guys think?

ChrsMark commented 1 year ago

...if instead of providing docs for the customer to fix this issue on their...

...we could embed the "Processor" code already in the integration (in the right place) and expose an optional field to the User in the Kibana UI.

That would work yes, and specially if we define this in integration level and not just in data_stream level.

However in that case we expose sth that is a "patch" to the users and at some point we will remove this so I have mixed feelings about this and I would prefer prioritizing the backend implementation and invest time directly on this.

In any case, that would be doable if @mlunadia agrees with that from product perspective. cc: @gizas

gsantoro commented 1 year ago

hello @ChrsMark and @tetianakravchenko, @gizas , I managed to get the cluster name from the AKS metadata endpoint with the following bash script

kubectl debug node/aks-nodepool1-36348082-vmss000000 -it --image=mcr.microsoft.com/dotnet/runtime-deps:6.0 -- /bin/bash -c 'apt-get update; \
apt-get install -y curl jq; \
RESOURCE_NAME=$(curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq .compute.resourceGroupName); \
arrIN=(${RESOURCE_NAME//_/ }); \
echo ${arrIN[2]};

Here I'm using a debug command to get a node shell to query the AKS metadata from inside the K8s cluster. The AKS metadata expose the cluster name inside a field under the jsonpath .compute.resourceGroupName. You need to split that string by _ and then get the 3 element in the array.

ChrsMark commented 1 year ago

hello @ChrsMark and @tetianakravchenko, @gizas , I managed to get the cluster name from the AKS metadata endpoint with the following bash script
kubectl debug node/aks-nodepool1-36348082-vmss000000 -it --image=mcr.microsoft.com/dotnet/runtime-deps:6.0 -- /bin/bash -c 'apt-get update; \
apt-get install -y curl jq; \
RESOURCE_NAME=$(curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq .compute.resourceGroupName); \
arrIN=(${RESOURCE_NAME//_/ }); \
echo ${arrIN[2]};
Here I'm using a debug command to get a node shell to query the AKS metadata from inside the K8s cluster. The AKS metadata expose the cluster name inside a field under the jsonpath .compute.resourceGroupName. You need to split that string by _ and then get the 3 element in the array.

Nice! I guess this can be added at https://github.com/elastic/beats/blob/25786cdda70b31cb1738373265bf3a0f3dec76f6/libbeat/processors/add_cloud_metadata/provider_azure_vm.go similarly to what we do for the gke case at https://github.com/elastic/beats/blob/25786cdda70b31cb1738373265bf3a0f3dec76f6/libbeat/processors/add_cloud_metadata/provider_google_gce.go. Btw where we can find this 169.254.169.254 IP? Is this the Node's IP?

In general we try to add the cloud provider specific implementation under the add_cloud_metadata processor. So the basic implementation for these metadata that uses the kubeconfig apporach lives in https://github.com/elastic/elastic-agent-autodiscover/blob/6ee69244193e9ba3159304650f39152f8fad32a7/kubernetes/metadata/metadata.go#L110 but when this is not able to cover the case we leverage the add_cloud_metadata processor (which enabled by default) to get these.

gizas commented 1 year ago

169.254.169.254

This is a predefined IP for Azure. https://learn.microsoft.com/en-us/azure/virtual-machines/windows/instance-metadata-service?tabs=linux

I guess the result will be the same for [168.63.129.16] ?

tetianakravchenko commented 1 year ago

hello @ChrsMark and @tetianakravchenko, @gizas , I managed to get the cluster name from the AKS metadata endpoint with the following bash script

kubectl debug node/aks-nodepool1-36348082-vmss000000 -it --image=mcr.microsoft.com/dotnet/runtime-deps:6.0 -- /bin/bash -c 'apt-get update; \ apt-get install -y curl jq; \ RESOURCE_NAME=$(curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq .compute.resourceGroupName); \ arrIN=(${RESOURCENAME/// }); \ echo ${arrIN[2]};

Here I'm using a debug command to get a node shell to query the AKS metadata from inside the K8s cluster. The AKS metadata expose the cluster name inside a field under the jsonpath .compute.resourceGroupName. You need to split that string by _ and then get the 3 element in the array.

This approach will not work for cases when the Resource group or cluster name contains the _

if Resource group is not defined, default name will be _group:
_ is also acceptable for the cluster name

MC_<resourcName>_<clusterName>_<region>, for this cluster it will be MC_k8s_cluster_name_group_k8s_cluster_name_eastus

Vbubblery commented 8 months ago

Hi,

Any updates or solution here?

Grinch321 commented 6 months ago

Hi, any updates?

gizas commented 6 months ago

support for AWS already included in agent version 8.9.0 and later(See release-notes-8.9.0.html Issue 35182)

For AKS is still in the roadmap

ptonini commented 5 months ago

I've implemented a simple workaround, setting the cluster name as a environment variable on the pod:

processors: |
  - add_fields:
      target: orchestrator.cluster
      fields:
        name: $${env.CLUSTER_NAME}
        url: $${env.CLUSTER_URL}

MichaelKatsoulis commented 5 months ago

The best way to retrieve the AKS cluster name is by using the azure sdk and for the given subscription Id list the Managed Clusters. Then we can filter by the resourceGroupName which we get from the metadata endpoint. This solution can give us the cluster id and cluster name. But due to azure authentication reasons the TENANT_ID, CLIENT_ID and CLIENT_SECRET are required. These can be provided as env vars in metricbeat/agent.

https://github.com/elastic/beats/pull/37685

randywatson1979 commented 4 months ago

Well, you could at least get the resourcegroup/cluster name from: kubernetes.node.labels.kubernetes_azure_com/cluster Which looks like "MC_some_name_some_name_westeurope" which is unique to an aks cluster.

Would like to output that value in an advanced watcher template, but I have no idea how to escape/format that forward slash in order for the message parser to recognise and read the value of that key.

elastic / beats

Investigate if it is possible to set orchestrator fields from Cloud provider kubernetes metadata #33081

Azure

AWS

Google cloud