Closed s3asfour closed 1 year ago
Could you double check that you have the expected permissions, by running the commands from the FAQ:
https://github.com/astefanutti/kubebox#faq
Also could you double check the cAdvisor pod logs, to check for any particular issue?
Both commands return "yes".
The cAdvisor pod logs are empty, which is very weird :/ that's the first thing I tried to look at. But maybe that in itself can hint to the issue?
Yes, that's surprising the cAdvisor logs are empty. I've checked on one of my setups and can see lots of statements.
Could you check the events for the cAdvisor pods, or directly looking into the pods manifests?
Events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m57s default-scheduler Successfully assigned cadvisor/cadvisor-jlwb9 to gke-staging-cloud-cl-staging-cloud-no-ce55b49b-wk4r
Normal Pulling 5m55s kubelet Pulling image "k8s.gcr.io/cadvisor:v0.36.0"
Normal Pulled 5m37s kubelet Successfully pulled image "k8s.gcr.io/cadvisor:v0.36.0"
Normal Created 5m36s kubelet Created container cadvisor
Normal Started 5m36s kubelet Started container cadvisor
And here's my running pods manifest:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2021-01-28T14:51:12Z"
generateName: cadvisor-
labels:
app: cadvisor
controller-revision-hash: 795f564df9
name: cadvisor
pod-template-generation: "1"
name: cadvisor-jlwb9
namespace: cadvisor
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: DaemonSet
name: cadvisor
uid: 1cc42669-42d7-4a07-acbb-62e33cd02eed
resourceVersion: "776256"
selfLink: /api/v1/namespaces/cadvisor/pods/cadvisor-jlwb9
uid: 95dff31a-748d-42c5-9710-718efcac52af
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- gke-staging-cloud-cl-staging-cloud-no-ce55b49b-wk4r
automountServiceAccountToken: false
containers:
- args:
- --storage_duration=5m0s
- --housekeeping_interval=10s
image: k8s.gcr.io/cadvisor:v0.36.0
imagePullPolicy: IfNotPresent
name: cadvisor
ports:
- containerPort: 8080
name: http
protocol: TCP
resources:
limits:
cpu: 300m
memory: 2000Mi
requests:
cpu: 150m
memory: 200Mi
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /rootfs
name: rootfs
readOnly: true
- mountPath: /var/log
name: var-log
readOnly: true
- mountPath: /var/run
name: var-run
readOnly: true
- mountPath: /sys
name: sys
readOnly: true
- mountPath: /var/lib/containers
name: containers
readOnly: true
- mountPath: /var/lib/docker
name: docker
readOnly: true
- mountPath: /dev/disk
name: disk
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: gke-staging-cloud-cl-staging-cloud-no-ce55b49b-wk4r
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: cadvisor
serviceAccountName: cadvisor
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/disk-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/pid-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/unschedulable
operator: Exists
volumes:
- hostPath:
path: /
type: ""
name: rootfs
- hostPath:
path: /var/log
type: ""
name: var-log
- hostPath:
path: /var/run
type: ""
name: var-run
- hostPath:
path: /sys
type: ""
name: sys
- hostPath:
path: /var/lib/containers
type: ""
name: containers
- hostPath:
path: /var/lib/docker
type: ""
name: docker
- hostPath:
path: /dev/disk
type: ""
name: disk
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2021-01-28T14:51:12Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2021-01-28T14:52:06Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2021-01-28T14:52:06Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2021-01-28T14:51:12Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://21b1453a6b71dd31f4f654dc633ba3dd3a97dfb8c7a4f5a55293a54cdf0437a7
image: k8s.gcr.io/cadvisor:v0.36.0
imageID: docker-pullable://k8s.gcr.io/cadvisor@sha256:16bc6858dc5b7063c7d89153ad6544370eb79cb27a1b8d571f31b98673f7a324
lastState: {}
name: cadvisor
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2021-01-28T14:51:33Z"
hostIP: 10.0.0.6
phase: Running
podIP: 10.0.4.2
podIPs:
- ip: 10.0.4.2
qosClass: Burstable
startTime: "2021-01-28T14:51:12Z"
This looks normal.
Could you run:
$ kubectl get --raw "/api/v1/namespaces/cadvisor/pods/cadvisor-jlwb9/proxy/api/v2.0/spec?recursive=true"
i get the error message
Error from server (ServiceUnavailable): the server is currently unable to handle the request
Thanks, this is clearly the issue.
Here is the output that I have:
$ kubectl get --raw "/api/v1/namespaces/cadvisor/pods/cadvisor-rd24x/proxy/api/v2.0/spec?recursive=true"
{"/":{"creation_time":"2021-01-28T11:27:47.669999981Z","has_cpu":true,"cpu":{"limit":1024,"max_limit":0,"mask":"0-3"},"has_memory":true,"memory":{"limit":3959975936,"reservation":8796093018112,"swap_limit":104853504},"has_custom_metrics":false,"has_processes":false,"processes":{},"has_network":true,"has_filesystem":true,"has_diskio":true}...
Could you run:
$ kubectl get --raw "/api/v1/namespaces/cadvisor/pods/cadvisor-jlwb9/proxy"
To determine whether that's an issue with the API server proxy or the cAdvisor pod.
I get the same error message
Error from server (ServiceUnavailable): the server is currently unable to handle the request
Thanks, if your cluster supports it, it'd be useful to run, as a final check:
$ kubectl debug -it -n cadvisor cadvisor-jlwb9 --image=busybox
# curl http://localhost:8080/
For some reasons, it seems cAdvisor does not start correctly, still the pod reports an healthy condition!
One approach could be to try deploying from the cAdvisor documentation:
https://github.com/google/cadvisor/tree/master/deploy/kubernetes
There may be a compatibility issue with the version of the cAdvisor template that Kubebox provides and your cluster.
I deployed cAdvisor from it's repo and now i see some logs in the pod, but the kubebox resource metrics window still shows the same error:
Resource usage metrics unavailable
I0128 20:58:39.323099 1 storagedriver.go:50] Caching stats in memory for 2m0s
I0128 20:58:39.323729 1 manager.go:154] cAdvisor running in container: "/sys/fs/cgroup/cpu,cpuacct"
I0128 20:58:39.417720 1 fs.go:142] Filesystem UUIDs: map[1089-6870:/dev/sda12 33ee302f-5e82-4695-b3ff-6e803d26508b:/dev/sda1 e286b489-3849-4a10-b7be-42e853faaa8d:/dev/sda8]
I0128 20:58:39.417758 1 fs.go:143] Filesystem partitions: map[tmpfs:{mountpoint:/dev major:0 minor:268 fsType:tmpfs blockSize:0} /dev/root:{mountpoint:/rootfs major:253 minor:0 fsType:ext2 blockSize:0} /dev/sda8:{mountpoint:/rootfs/usr/share/oem major:8 minor:8 fsType:ext4 blockSize:0} /dev/sda1:{mountpoint:/var/lib/docker major:8 minor:1 fsType:ext4 blockSize:0} shm:{mountpoint:/rootfs/var/lib/docker/containers/86f6702134b6c286fb185c69d9a414430bb0fa6e94c012c585bde84c5182159f/mounts/shm major:0 minor:59 fsType:tmpfs blockSize:0}]
I0128 20:58:39.425589 1 manager.go:227] Machine: {NumCores:2 CpuFrequency:2299998 MemoryCapacity:4140908544 HugePages:[{PageSize:2048 NumPages:0}] MachineID:f113b713760c17bb1c10725e60cceac4 SystemUUID:f113b713-760c-17bb-1c10-725e60cceac4 BootID:dc3a5eb4-472f-48dc-b294-8930a56a0440 Filesystems:[{Device:/dev/sda8 DeviceMajor:8 DeviceMinor:8 Capacity:12042240 Type:vfs Inodes:4096 HasInodes:true} {Device:/dev/sda1 DeviceMajor:8 DeviceMinor:1 Capacity:101241290752 Type:vfs Inodes:6258720 HasInodes:true} {Device:shm DeviceMajor:0 DeviceMinor:59 Capacity:67108864 Type:vfs Inodes:505482 HasInodes:true} {Device:overlay DeviceMajor:0 DeviceMinor:252 Capacity:101241290752 Type:vfs Inodes:6258720 HasInodes:true} {Device:tmpfs DeviceMajor:0 DeviceMinor:268 Capacity:67108864 Type:vfs Inodes:505482 HasInodes:true} {Device:/dev/root DeviceMajor:253 DeviceMinor:0 Capacity:1279787008 Type:vfs Inodes:79360 HasInodes:true}] DiskMap:map[253:0:{Name:dm-0 Major:253 Minor:0 Size:1300234240 Scheduler:none} 9:0:{Name:md0 Major:9 Minor:0 Size:0 Scheduler:none} 8:0:{Name:sda Major:8 Minor:0 Size:107374182400 Scheduler:mq-deadline}] NetworkDevices:[{Name:cbr0 MacAddress:36:e3:3f:b0:97:d5 Speed:0 Mtu:1460} {Name:eth0 MacAddress:42:01:0a:00:00:07 Speed:-1 Mtu:1460}] Topology:[{Id:0 Memory:4140908544 Cores:[{Id:0 Threads:[0 1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:47185920 Type:Unified Level:3}]}] CloudProvider:GCE InstanceType:e2-medium InstanceID:2096945664381373919}
I0128 20:58:39.449820 1 manager.go:233] Version: {KernelVersion:4.19.112+ ContainerOsVersion:Alpine Linux v3.7 DockerVersion:19.03.1 DockerAPIVersion:1.40 CadvisorVersion:v0.30.2 CadvisorRevision:de723a09}
I0128 20:58:39.482021 1 factory.go:356] Registering Docker factory
I0128 20:58:39.506078 1 factory.go:136] Registering containerd factory
I0128 20:58:39.506313 1 factory.go:54] Registering systemd factory
I0128 20:58:39.509619 1 factory.go:86] Registering Raw factory
I0128 20:58:39.513067 1 manager.go:1205] Started watching for new ooms in manager
W0128 20:58:39.513129 1 manager.go:340] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
I0128 20:58:39.514451 1 manager.go:356] Starting recovery of all containers
I0128 20:58:41.516086 1 manager.go:361] Recovery completed
I0128 20:58:42.705861 1 cadvisor.go:165] Starting cAdvisor version: v0.30.2-de723a09 on port 8080
Kubebox expects cAdvisor to be deployed in the cadvisor
namespace. Also there are a couple of things to be done to make sure cAdvisor is configured for the container runtime used in the cluster:
Also it looks like the version deployed from the cAdvisor repository is quite old, v0.30.2-de723a09, and it may not have the latest version of the API.
A quick check is to run:
$ kubectl get --raw "/api/v1/namespaces/cadvisor/pods/<cadvisor_pod>/proxy/api/v2.0/spec?recursive=true"
which is the first request Kubebox does.
$ debug -it -n cadvisor cadvisor-jlwb9 --image=busybox # curl http://localhost:8080/
hi @astefanutti, how do I can get that debug
command? kubectl exec
doesnt have the --image
flag :(
@widnyana it's kubectl debug
: https://kubernetes.io/docs/tasks/debug-application-cluster/debug-running-pod/#ephemeral-container-example. I forgot to add kubectl
by mistake.
I had the same issue, and I have managed to resolve it on my private GKE cluster.
I added a firewall rule to enable the connection from Kube API server ( master-nodes ) to the worker nodes on port 8080
Let me close this. The cAdvisor deployment example has been updated with the latest version. Let me know if you still face the issue.
I am trying to use kubebox to access resource usage of pods in my cluster. I installed kubebox v0.9.0 and I deployed the cAdvisor daemonset, as mentioned in the README, but now i get the error message "Resource usage metrics unavailable". I don't see any logs in the cadvisor pod, which i was hoping would lead me to the issue. So I have no idea what's the problem.
Any help is appreciated!