Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.96k stars 305 forks source link

[BUG] Running az aks update --disable-azure-monitor-metrics Does not stop/remove node-exporter service from the node. #3954

Closed mllab-nl closed 3 weeks ago

mllab-nl commented 11 months ago

Describe the bug Running az aks update --disable-azure-monitor-metrics Does not stop/remove node-exporter service from the node. According to https://learn.microsoft.com/en-us/azure/azure-monitor/containers/prometheus-metrics-disable it should ... Removes the agent from the cluster nodes. ...

To Reproduce Steps to reproduce the behavior:

  1. Create a cluster
  2. az aks update --disable-azure-monitor-metrics
  3. On a cluster node check for node-exporter service. e.g. journalctl -u node-exporter It still runs and listens to 19100 . It is still being talked to by azure (using konnectivity-agent)
  4. Rebooting the node, creating new node does not remove the service. The VM has latest model according to the scale set

Expected behavior The service is stopped and removed from all nodes.

Environment (please complete the following information):

shanalily commented 11 months ago

node-exporter is not part of the metrics addon node agent, so disabling the metrics addon will not remove node-exporter. The node-exporter service is installed by the AKSNode VM extension, and the services it manages are also used for internal monitoring.

mllab-nl commented 11 months ago

Hey @shanalily ! Thanks for pointing the FAQ out !

I am confused. Thus apparently Disable Prometheus metrics collection from an AKS cluster does not disable Prometheus metrics collection from AKS nodes - 😕

I might not be the only one who is confused.

IMO would be nice to add the information to the help page https://learn.microsoft.com/en-us/azure/azure-monitor/containers/prometheus-metrics-disable and clarify what Agent is removed and that the show will go on and metrics will still be collected for internal monitoring

Thus I would consider this not as a bug, but as communication/documentation improvement

aritraghosh commented 11 months ago

@mllab-nl Could you expand on your question. When you disable prometheus collection, are the ama-metrics-* pods not removed from the cluster?

mllab-nl commented 11 months ago

@aritraghosh The question is about disabling the node-exporter that is not running as a pod but is installed on the worker nodes. (What would be a good reason not to run it as a pod ?....)

And as @shanalily confirmed it can not be disabled because it is used for internal AKS reasons.

microsoft-github-policy-service[bot] commented 7 months ago

Action required from @Azure/aks-pm

microsoft-github-policy-service[bot] commented 7 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 6 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 6 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 5 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 5 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 4 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 4 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 3 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 3 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 2 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 2 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 1 month ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 1 month ago

Issue needing attention of @Azure/aks-leads

AllenWen-at-Azure commented 3 weeks ago

Close this issue as it was answered in https://github.com/Azure/AKS/issues/3954#issuecomment-1771288066, and the document was updated.