Azure-Samples / virtual-node-autoscale

A sample application to demonstrate Autoscale with AKS Virtual Nodes
MIT License
73 stars 40 forks source link

Cannot get prometheus to work AND cannot get autoscale to work if using pre-installed metrics-server instead of prometheus #40

Open anujgeek opened 5 years ago

anujgeek commented 5 years ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

I created an AKS cluster with RBAC and virtual nodes enabled. Added app deployment of a .Net Core Web API project using a docker image hosted on ACR. The deployment container has request and limits configured for cpu and memory. I then exposed the app deployment as a service. I now configured helm using RBAC and TLS, installed an ingress controller to route to the service, and then created a public dns for it. I then configured a hpa with metrics for cpu and memory. Two issues here:

  1. If I do a prometheus installation using the stable/prometheus-operator helm chart, one of the node exporter pod always fails with error: Pod ... requires volume proc which is of an unsupported type
  2. If I don't do a prometheus installation and use only the pre-installed metrics-server, I cannot get the autoscaler to work. Whenever I hit the app with load, the metrics api for pod also throws a not available error.

Any log messages given by the failure

Error Case 1: Name: gangly-pug-prometheus-node-exporter-9kfx9 Namespace: monitoring Priority: 0 PriorityClassName: Node: virtual-node-aci-linux/ Labels: app=prometheus-node-exporter chart=prometheus-node-exporter-1.4.2 controller-revision-hash=cbc476c95 heritage=Tiller jobLabel=node-exporter pod-template-generation=1 release=gangly-pug Annotations: Status: Pending Reason: ProviderFailed Message: Pod gangly-pug-prometheus-node-exporter-9kfx9 requires volume proc which is of an unsupported type IP: Controlled By: DaemonSet/gangly-pug-prometheus-node-exporter Containers: node-exporter: Image: quay.io/prometheus/node-exporter:v0.17.0 Port: 9100/TCP Host Port: 9100/TCP Args: --path.procfs=/host/proc --path.sysfs=/host/sys --web.listen-address=0.0.0.0:9100 --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/) --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$ Liveness: http-get http://:9100/ delay=0s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get http://:9100/ delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: KUBERNETES_PORT_443_TCP_ADDR: containersampleaks3-dns-cefb8c3e.hcp.eastus.azmk8s.io KUBERNETES_PORT: tcp://containersampleaks3-dns-cefb8c3e.hcp.eastus.azmk8s.io:443 KUBERNETES_PORT_443_TCP: tcp://containersampleaks3-dns-cefb8c3e.hcp.eastus.azmk8s.io:443 KUBERNETES_SERVICE_HOST: containersampleaks3-dns-cefb8c3e.hcp.eastus.azmk8s.io Mounts: /host/proc from proc (ro) /host/sys from sys (ro) /var/run/secrets/kubernetes.io/serviceaccount from gangly-pug-prometheus-node-exporter-token-7jpcm (ro) Conditions: Type Status PodScheduled True Volumes: proc: Type: HostPath (bare host directory volume) Path: /proc HostPathType: sys: Type: HostPath (bare host directory volume) Path: /sys HostPathType: gangly-pug-prometheus-node-exporter-token-7jpcm: Type: Secret (a volume populated by a Secret) SecretName: gangly-pug-prometheus-node-exporter-token-7jpcm Optional: false QoS Class: BestEffort Node-Selectors: Tolerations: :NoSchedule node.kubernetes.io/disk-pressure:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/network-unavailable:NoSchedule node.kubernetes.io/not-ready:NoExecute node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unschedulable:NoSchedule

Error Case 2: When not hit with load (working fine): kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/containersample/pods/containersample-b77 d7d868-ktdht {"kind":"PodMetrics","apiVersion":"metrics.k8s.io/v1beta1","metadata":{"name":"containersample-b77d7d868-ktdht","namespace":"containersample","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/containersample/pods/containersample-b77d7d868-ktdht","creationTimestamp":"2019-06-06T23:32:03Z"},"timestamp":"2019-06-06T23:31:00Z","window":"1m0s","containers":[{"name":"containersample","usage":{"cpu":"0","memory":"73000Ki"}}]} When I hit the app with load: Error from server (NotFound): podmetrics.metrics.k8s.io "containersample/containersample-b77d7d868-ktdht" not found

Expected/desired behavior

  1. Prometheu-operator installation should work without error
  2. Pre-installed metrics server should work for autocaler

OS and Version?

Windows 10

Versions

Kubernetes: v1.13.5

Mention any other details that might be useful


Thanks! We'll be in touch soon.

anujgeek commented 5 years ago

@rbitia Can you please help me with this case or include anyone who can?

mimckitt commented 5 years ago

@lachie83 @jluk @seanmck can one of you please assist on this? Issue has been opened for over a month with no response. Else, if you have a better route @anujgeek can take to get help on this please let me know. I am happy to direct and facilitate as needed.

dkkapur commented 5 years ago

Tagging @srrengar as well