Closed brobichaud closed 4 years ago
Could this be caused by this issue with slow response from the api? https://github.com/kubernetes/kubernetes/issues/75752
I've seen metrics become unavailable through kubectl top nodes, with e.g. 10 containers on a Windows node.
Hmmm, that issue is definitely suspicious looking, but it is hard for me to say if that would actually cause this. I do see "kubectl top nodes" return "unknown" on my Windows nodepool currently.
Hi, have you set resource limit in the deployment yaml definition?
@cpunella Yes I do have both CPU and RAM limits in all of my deployments. Does that somehow affect hpa or access to metrics data?
It should be the metrics api issue, you can check the metrics server pod logs,
kubectl logs -n kube-system --tail=100 metrics-server-66dbbb67db-vtzq9
you will find something like:
Failed to get kubelet_summary:10.10.0.252:10255 response in time
Indeed @zhiweiv I am seeing events like:
Failed to get kubelet_summary...
mixed in with a ton of:
No metrics for pod default/dev
Thanks for the steps to confirm.
@brobichaud yes, I found that if you don't set limits in the deployment, metrics are not collected ...
@brobichaud yes, I found that if you don't set limits in the deployment, metrics are not collected ...
But I do have limits on all of my deployments, yet metrics seem hit or miss...
Hi @brobichaud Can you please try this script and let me know it is working or not
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: dev-burns-portal-hpa
spec:
minReplicas: 2
maxReplicas: 4
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name:
And make sure your deployment file you have allocate resources for your pods
Does anyone know if it is accurate to say that this should be addressed by this change: https://github.com/kubernetes/kubernetes/issues/74991
Thanks my understanding - @marosset to confirm.
I think so. Also, the fix is still only available in 1.18+ (my PRs to backport the fix to 1.15-1.17 are still in limbo)
I think so. Also, the fix is still only available in 1.18+ (my PRs to backport the fix to 1.15-1.17 are still in limbo)
If you do end up getting this into earlier k8s releases could you update this issue?
This got fixed in https://github.com/kubernetes/kubernetes/pull/87730, but introduced this issue: https://github.com/kubernetes/kubernetes/pull/90554, which is now fixed and deployed in AKS from 1.16.9+, 1.17.5+ and 1.18.1+
What happened: I am frequently seeing warning events regarding metrics and the horizontal pod autoscaler as seen below:
It's unclear to me whether this is truly an AKS setup issue or is really a k8s or hpa issue, so I'm starting here. I didn't see these when I had a raw AKS-Engine based cluster not long ago.
What you expected to happen: For the HPA to successfully acquire metrics on every call.
How to reproduce it (as minimally and precisely as possible): Setup a simple AKS cluster with Windows nodepool, deploy some pods and setup simple CPU-based autoscale rules, such as:
Monitor system events and you should see events similar to above referenced.
Anything else we need to know?: I was not seeing these with a raw AKS-Engine based cluster of similar configuration.
Environment:
kubectl version
): v1.14.3