Cannot get prometheus to work AND cannot get autoscale to work if using pre-installed metrics-server instead of prometheus

anujgeek commented 5 years ago

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [ x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

I created an AKS cluster with RBAC and virtual nodes enabled. Added app deployment of a .Net Core Web API project using a docker image hosted on ACR. The deployment container has request and limits configured for cpu and memory. I then exposed the app deployment as a service. I now configured helm using RBAC and TLS, installed an ingress controller to route to the service, and then created a public dns for it. I then configured a hpa with metrics for cpu and memory. Two issues here:

If I do a prometheus installation using the stable/prometheus-operator helm chart, one of the node exporter pod always fails with error: Pod ... requires volume proc which is of an unsupported type

If I don't do a prometheus installation and use only the pre-installed metrics-server, I cannot get the autoscaler to work. Whenever I hit the app with load, the metrics api for pod also throws a not available error.

Any log messages given by the failure

Error Case 1: Name: gangly-pug-prometheus-node-exporter-9kfx9 Namespace: monitoring Priority: 0 PriorityClassName: Node: virtual-node-aci-linux/ Labels: app=prometheus-node-exporter chart=prometheus-node-exporter-1.4.2 controller-revision-hash=cbc476c95 heritage=Tiller jobLabel=node-exporter pod-template-generation=1 release=gangly-pug Annotations: Status: Pending Reason: ProviderFailed Message: Pod gangly-pug-prometheus-node-exporter-9kfx9 requires volume proc which is of an unsupported type IP: Controlled By: DaemonSet/gangly-pug-prometheus-node-exporter Containers: node-exporter: Image: quay.io/prometheus/node-exporter:v0.17.0 Port: 9100/TCP Host Port: 9100/TCP Args: --path.procfs=/host/proc --path.sysfs=/host/sys --web.listen-address=0.0.0.0:9100 --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/) --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$ Liveness: http-get http://:9100/ delay=0s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get http://:9100/ delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: KUBERNETES_PORT_443_TCP_ADDR: containersampleaks3-dns-cefb8c3e.hcp.eastus.azmk8s.io KUBERNETES_PORT: tcp://containersampleaks3-dns-cefb8c3e.hcp.eastus.azmk8s.io:443 KUBERNETES_PORT_443_TCP: tcp://containersampleaks3-dns-cefb8c3e.hcp.eastus.azmk8s.io:443 KUBERNETES_SERVICE_HOST: containersampleaks3-dns-cefb8c3e.hcp.eastus.azmk8s.io Mounts: /host/proc from proc (ro) /host/sys from sys (ro) /var/run/secrets/kubernetes.io/serviceaccount from gangly-pug-prometheus-node-exporter-token-7jpcm (ro) Conditions: Type Status PodScheduled True Volumes: proc: Type: HostPath (bare host directory volume) Path: /proc HostPathType: sys: Type: HostPath (bare host directory volume) Path: /sys HostPathType: gangly-pug-prometheus-node-exporter-token-7jpcm: Type: Secret (a volume populated by a Secret) SecretName: gangly-pug-prometheus-node-exporter-token-7jpcm Optional: false QoS Class: BestEffort Node-Selectors: Tolerations: :NoSchedule node.kubernetes.io/disk-pressure:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/network-unavailable:NoSchedule node.kubernetes.io/not-ready:NoExecute node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unschedulable:NoSchedule

Error Case 2: When not hit with load (working fine): kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/containersample/pods/containersample-b77 d7d868-ktdht {"kind":"PodMetrics","apiVersion":"metrics.k8s.io/v1beta1","metadata":{"name":"containersample-b77d7d868-ktdht","namespace":"containersample","selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/containersample/pods/containersample-b77d7d868-ktdht","creationTimestamp":"2019-06-06T23:32:03Z"},"timestamp":"2019-06-06T23:31:00Z","window":"1m0s","containers":[{"name":"containersample","usage":{"cpu":"0","memory":"73000Ki"}}]} When I hit the app with load: Error from server (NotFound): podmetrics.metrics.k8s.io "containersample/containersample-b77d7d868-ktdht" not found

Expected/desired behavior

Prometheu-operator installation should work without error

Pre-installed metrics server should work for autocaler

OS and Version?

Windows 10

Versions

Kubernetes: v1.13.5

Mention any other details that might be useful

Thanks! We'll be in touch soon.

anujgeek commented 5 years ago

@rbitia Can you please help me with this case or include anyone who can?

mimckitt commented 5 years ago

@lachie83 @jluk @seanmck can one of you please assist on this? Issue has been opened for over a month with no response. Else, if you have a better route @anujgeek can take to get help on this please let me know. I am happy to direct and facilitate as needed.

dkkapur commented 5 years ago

Tagging @srrengar as well

Azure-Samples / virtual-node-autoscale

Cannot get prometheus to work AND cannot get autoscale to work if using pre-installed metrics-server instead of prometheus #40

Please provide us with the following information:

This issue is for a: (mark with an `x`)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Versions

Mention any other details that might be useful

Azure-Samples / virtual-node-autoscale

Cannot get prometheus to work AND cannot get autoscale to work if using pre-installed metrics-server instead of prometheus #40

Please provide us with the following information:

This issue is for a: (mark with an x)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Versions

Mention any other details that might be useful

This issue is for a: (mark with an `x`)