Unable to gather advanced metrics, ERROR "was collected before with the same name and label values"

yevon commented 4 years ago

Describe the problem

When I install the advanced kube-state-metrics deployment, the dashboard for gathering metrics stops working. If I check do-agent pod logs, I see some errors, stating duplicated attribute values. I followed this guide for activating advanced metrics:

https://www.digitalocean.com/docs/kubernetes/how-to/monitor-advanced/

If I uninstall advanced-metrics or scale the pods to 0, dashboard starts working again.

Steps to reproduce

It happens with kube-state-metrics:2.0.0-alpha

Expected behavior

Be able to get advanced pod schediling metrics.

System Information

Digital Ocean managed kubernetes 1.18.8

do-agent information:

do-agent-log

2020-09-27T13:03:09.553931294Z ERROR: 2020/09/27 13:03:09 /home/do-agent/cmd/do-agent/run.go:60: failed to gather metrics: 45 error(s) occurred: 2020-09-27T13:03:09.553990229Z collected metric "kube_daemonset_status_number_available" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554014114Z collected metric "kube_daemonset_status_number_available" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554019932Z collected metric "kube_daemonset_status_number_available" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554042459Z collected metric "kube_deployment_status_replicas_unavailable" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554048024Z collected metric "kube_deployment_status_replicas_unavailable" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554052770Z collected metric "kube_deployment_status_replicas_unavailable" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554057690Z collected metric "kube_deployment_status_replicas_unavailable" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554062350Z collected metric "kube_deployment_status_replicas_unavailable" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554067062Z collected metric "kube_deployment_status_replicas_unavailable" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554071742Z collected metric "kube_deployment_status_replicas_unavailable" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554076542Z collected metric "kube_deployment_status_replicas_unavailable" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554081945Z collected metric "kube_deployment_status_replicas_unavailable" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554086634Z collected metric "kube_deployment_status_replicas_unavailable" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554092869Z collected metric "kube_deployment_status_replicas_unavailable" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554097734Z collected metric "kube_deployment_status_replicas_unavailable" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554102457Z collected metric "kube_daemonset_status_number_unavailable" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554127517Z collected metric "kube_daemonset_status_number_unavailable" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554133235Z collected metric "kube_daemonset_status_number_unavailable" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554138043Z collected metric "kube_deployment_status_replicas_available" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554148110Z collected metric "kube_deployment_status_replicas_available" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554153232Z collected metric "kube_deployment_status_replicas_available" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554157955Z collected metric "kube_deployment_status_replicas_available" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554162715Z collected metric "kube_deployment_status_replicas_available" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554167703Z collected metric "kube_deployment_status_replicas_available" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554175570Z collected metric "kube_deployment_status_replicas_available" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554183135Z collected metric "kube_deployment_status_replicas_available" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554219858Z collected metric "kube_deployment_status_replicas_available" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554241193Z collected metric "kube_deployment_status_replicas_available" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554249105Z collected metric "kube_deployment_status_replicas_available" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554256065Z collected metric "kube_deployment_status_replicas_available" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554263653Z collected metric "kube_daemonset_status_desired_number_scheduled" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554270760Z collected metric "kube_daemonset_status_desired_number_scheduled" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554305026Z collected metric "kube_daemonset_status_desired_number_scheduled" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554315521Z collected metric "kube_deployment_spec_replicas" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554323804Z collected metric "kube_deployment_spec_replicas" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554331593Z collected metric "kube_deployment_spec_replicas" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554338594Z collected metric "kube_deployment_spec_replicas" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554345304Z collected metric "kube_deployment_spec_replicas" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554362434Z collected metric "kube_deployment_spec_replicas" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554397076Z collected metric "kube_deployment_spec_replicas" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554401950Z collected metric "kube_deployment_spec_replicas" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554406640Z collected metric "kube_deployment_spec_replicas" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554411351Z collected metric "kube_deployment_spec_replicas" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554417625Z collected metric "kube_deployment_spec_replicas" { gauge: } was collected before with the same name and label values 2020-09-27T13:03:09.554422403Z * collected metric "kube_deployment_spec_replicas" { gauge: } was collected before with the same name and label values

bsnyder788 commented 4 years ago

@yevon I will look into this soon, and see if I can reproduce it on my end! Thanks for the report!

bsnyder788 commented 3 years ago

I wasn't able to reproduce this issue @yevon. We had a similar report for Ubuntu 20.04 on disk metrics collection that I just addressed in 3.8.0. This kube metrics one would not be able to be easily ignored like that one though since these are metrics we actually want to collect. Did you ever dig any deeper on your end?

yevon commented 3 years ago

I didn't check again with latest version, but I reproduced this issue exactly in another kubernetes cluster in another zone. Just a new kubernetes cluster and following DO documentation for installing advanced metrics. It might be fixed within latest advanced metrics or kubernetes version, when I have time to check it again I will let you know.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed if no further activity occurs.

bsnyder788 commented 3 years ago

still valid

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed if no further activity occurs.

bsnyder788 commented 3 years ago

still valid

blockloop commented 3 years ago

@bsnyder788 if you add the bug tag to this bug then the stale bot will stop marking it as stale. I believe that's the correct tag, but you can look it up in the stale bot settings for this repo.

vagkaefer commented 4 months ago

Any news on this? I have the same situation with a Cloudlinux server. It was working normally and I started getting these errors:

May 16 15:58:44 srv-001 DigitalOceanAgent[996139]: * collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda1" > label:<name:"fstype" value:"xfs" > label:<name:"mountpoint" value:"/usr/share/cagefs-skeleton/opt" > gauge:<value:3.9912136704e+10 > } was collected before with the same name and label values May 16 15:58:44 srv-001 DigitalOceanAgent[996139]: * collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda1" > label:<name:"fstype" value:"xfs" > label:<name:"mountpoint" value:"/usr/share/cagefs-skeleton/usr/local/apache/domlogs" > gauge:<value:6.3254278144e+10 > } was collected before with the same name and label values

vagkaefer commented 1 week ago

At first I managed to solve this problem here with the Cloudlinux server.

Check if the do-agent user exists, because the script runs by default with this user
Check if the user is not isolated in CageFS, if so, you can remove it from CageFS
I don't know how interesting it is, but running the service as root also solves the problem...

digitalocean / do-agent