gocrane / crane-scheduler

Crane scheduler is a Kubernetes scheduler which can schedule pod based on actual node load.
Apache License 2.0
224 stars 62 forks source link

自建Prometheus获取不到聚合指标 #7

Open Quintonwong opened 2 years ago

Quintonwong commented 2 years ago

1、看crane-scheduler-controller日志发现聚合指标的监控项指标都获取不到 W0626 20:55:02.198329 1 node.go:61] failed to sync this node ["k8s-node4/mem_usage_avg_5m"]: can not annotate node[k8s-node4]: failed to get data mem_usage_avg_5m{k8s-node4=}: 2、 fe3d166c668c1cc8739fbaf5d2ce873

autumn0207 commented 2 years ago

@Quintonwong

First, check if aggregated metrics data can be pulled inside the container:

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=cpu_usage_avg_5m'
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=mem_usage_avg_5m'

Then, check non-aggregated metrics data:

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=up'

If the non-aggregated metrics data is ok but non-aggregated metrics data cannot be pulled, it indicates that the prometheus rules does not take effect, please refer to https://prometheus.io/docs/prometheus/latest/configuration/configuration

ArvinChen1991 commented 2 years ago

@Quintonwong

First, check if aggregated metrics data can be pulled inside the container:

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=cpu_usage_avg_5m'
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=mem_usage_avg_5m'

Then, check non-aggregated metrics data:

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query'

If the non-aggregated metrics data is ok but non-aggregated metrics data cannot be pulled, it indicates that the prometheus rules does not take effect, please refer to https://prometheus.io/docs/prometheus/latest/configuration/configuration

output error curl -g 'http://x.x.x.x:9090/api/v1/query' {"status":"error","errorType":"bad_data","error":"invalid parameter 'query': parse error at char 1: no expression found in input"}

autumn0207 commented 2 years ago

curl -g 'http://x.x.x.x:9090/api/v1/query'

I made a mistake, the command should be

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=up'
ArvinChen1991 commented 2 years ago

curl -g 'http://x.x.x.x:9090/api/v1/query'

I made a mistake, the command should be

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=up'

Return Success

image
xieydd commented 1 year ago

I think you can increase second intervals of cpu_usage_active.

sdnmw commented 1 year ago

I have same Problem。kubernetes version:1.23.10,crane version: v0.5.1,crane-scheduler-controller:v0.1.23.

I have checked the aggregated metrics data and non-aggregated metrics data, both can be obtained, and the modification interval of cpu_usage_active is 5s, but I still cannot obtain the data and annotate Node.

W0319 15:26:24.293385 1 node.go:61] failed to sync this node ["kse2/cpu_usage_avg_5m"]: can not annotate node[kse2]: failed to get data cpu_usage_avg_5m{kse2=}: I0319 15:26:24.295764 1 node.go:75] Finished syncing node event "kse3/cpu_usage_avg_5m" (2.357063ms) W0319 15:26:24.295781 1 node.go:61] failed to sync this node ["kse3/cpu_usage_avg_5m"]: can not annotate node[kse3]: failed to get data cpu_usage_avg_5m{kse3=}: I0319 15:26:24.298258 1 node.go:75] Finished syncing node event "kse4/cpu_usage_avg_5m" (2.454873ms) W0319 15:26:24.298279 1 node.go:61] failed to sync this node ["kse4/cpu_usage_avg_5m"]: can not annotate node[kse4]: failed to get data cpu_usage_avg_5m{kse4=}:

image

image

Could you help me @xieydd ,Thanks very much.

nailianglu commented 1 year ago

@Quintonwong

首先,检查是否可以将聚合的指标数据拉入容器:

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=cpu_usage_avg_5m'
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=mem_usage_avg_5m'

然后,检查非聚合指标数据:

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=up'

如果非聚合指标数据正常,但无法拉取非聚合指标数据,则表明普罗米修斯规则没有生效,请参考https://普罗米修斯. io/docs/普罗米修斯/最新/配置/配置

你好,我也是遇到这个问题,进入到crane-scheduler-controller容器,可以获取到聚合数据,但是crane-scheduler-controller容器日志一直提示错误:I0330 13:18:01.658598 1 node.go:75] Finished syncing node event "cn-hangzhou.i-bp19r762s7xryoo6fjmx/mem_usage_avg_5m" (35.978µs) W0330 13:18:01.658604 1 node.go:61] failed to sync this node ["cn-hangzhou.i-bp19r762s7xryoo6fjmx/mem_usage_avg_5m"]: can not annotate node[cn-hangzhou.i-bp19r762s7xryoo6fjmx]: failed to get data mem_usage_avg_5m{cn-hangzhou.i-bp19r762s7xryoo6fjmx=}: Post "10.7.1.60/api/v1/query": unsupported protocol scheme "" ![Uploading 1680153559500.jpg…]()

wyaopeng commented 9 months ago

升级promeetheus和node-exporter至最新版本试下

niyang110 commented 4 months ago

@sdnmw 取不到值的原因是,crane会把nodename 转换为节点ip,用节点ip作为instance标签的值去Prometheus去查询的。 image 出现这种情况,应该是在K8S中部署的node_exporter,可以在Prometheus中抓取node-exporter加上标签的重置