gocrane / crane

Crane is a FinOps Platform for Cloud Resource Analytics and Economics in Kubernetes clusters. The goal is not only to help users to manage cloud cost easier but also ensure the quality of applications.
https://gocrane.io
Apache License 2.0
1.85k stars 378 forks source link

failed to wait for hpa-observer-controller caches to sync:(Support k8s version >= 1.26) #828

Open maxiaoyin opened 1 year ago

maxiaoyin commented 1 year ago

craned: 0.10

kube-service version: Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.5", GitCommit:"890a139214b4de1f01543d15003b5bda71aae9c7", GitTreeState:"clean", BuildDate:"2023-05-17T14:08:49Z", GoVersion:"go1.19.9", Compiler:"gc", Platform:"linux/amd64"}

craned exception precondition environment:

image

full log:

I0625 15:06:09.201423 1 nodelocal.go:25] Registering node local metrics collector cpu I0625 15:06:09.201485 1 nodelocal.go:25] Registering node local metrics collector cpuLoad I0625 15:06:09.201493 1 nodelocal.go:25] Registering node local metrics collector diskio I0625 15:06:09.201496 1 nodelocal.go:25] Registering node local metrics collector memory I0625 15:06:09.201499 1 nodelocal.go:25] Registering node local metrics collector netio I0625 15:06:09.238990 1 predictor.go:121] predictors map[dsp:0xc00074b860 percentile:0xc0001b86c0] I0625 15:06:09.247019 1 webhook.go:103] Succeed to setup autoscaling webhook I0625 15:06:09.247433 1 config.go:59] Load recommendation framework configuration set successfully. I0625 15:06:09.247453 1 manager.go:142] Recommendation Config updated. I0625 15:06:09.247648 1 manager.go:98] Start watching /tmp/recommendation-config/config.yaml for update. I0625 15:06:09.251416 1 predictor.go:141] predictor manager started, all predictors started I0625 15:06:09.251443 1 prediction.go:151] predictor Periodic started I0625 15:06:09.251493 1 leaderelection.go:248] attempting to acquire leader lease crane-system/craned... I0625 15:06:09.251727 1 prediction.go:302] predictor Percentile started I0625 15:06:09.252360 1 reflector.go:219] Starting reflector v1.Pod (9h47m21.860077463s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:06:09.252372 1 reflector.go:255] Listing and watching v1.Pod from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:06:09.255728 1 server.go:94] install crane api server middleware: log I0625 15:06:09.255741 1 server.go:94] install crane api server middleware: cors I0625 15:06:09.255748 1 server.go:94] install crane api server middleware: recovery I0625 15:06:09.255967 1 server.go:149] Start to listening on http address: 0.0.0.0:8082 I0625 15:06:09.259736 1 leaderelection.go:352] lock is held by craned-6d4b658844-48px9_f0d63f8f-0f68-4408-9f93-6e23d56e09ec and has not yet expired I0625 15:06:09.260490 1 leaderelection.go:253] failed to acquire lease crane-system/craned I0625 15:06:09.352812 1 shared_informer.go:270] caches populated I0625 15:06:12.141707 1 leaderelection.go:352] lock is held by craned-6d4b658844-48px9_f0d63f8f-0f68-4408-9f93-6e23d56e09ec and has not yet expired I0625 15:06:12.141722 1 leaderelection.go:253] failed to acquire lease crane-system/craned I0625 15:06:15.283816 1 leaderelection.go:352] lock is held by craned-6d4b658844-48px9_f0d63f8f-0f68-4408-9f93-6e23d56e09ec and has not yet expired I0625 15:06:15.283836 1 leaderelection.go:253] failed to acquire lease crane-system/craned I0625 15:06:18.609192 1 leaderelection.go:352] lock is held by craned-6d4b658844-48px9_f0d63f8f-0f68-4408-9f93-6e23d56e09ec and has not yet expired I0625 15:06:18.609212 1 leaderelection.go:253] failed to acquire lease crane-system/craned I0625 15:06:20.949090 1 leaderelection.go:352] lock is held by craned-6d4b658844-48px9_f0d63f8f-0f68-4408-9f93-6e23d56e09ec and has not yet expired I0625 15:06:20.949107 1 leaderelection.go:253] failed to acquire lease crane-system/craned I0625 15:06:23.294204 1 reflector.go:219] Starting reflector v1alpha1.EffectiveHorizontalPodAutoscaler (9h25m32.854325822s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:06:23.294379 1 reflector.go:255] Listing and watching v1alpha1.EffectiveHorizontalPodAutoscaler from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:06:23.394964 1 shared_informer.go:270] caches populated I0625 15:06:24.968867 1 leaderelection.go:258] successfully acquired lease crane-system/craned I0625 15:06:24.969457 1 reflector.go:219] Starting reflector v1alpha1.Recommendation (10h15m41.897892678s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:06:24.969470 1 reflector.go:255] Listing and watching v1alpha1.Recommendation from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:06:24.969719 1 reflector.go:219] Starting reflector v1alpha1.TimeSeriesPrediction (9h11m14.838150609s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:06:24.969728 1 reflector.go:255] Listing and watching v1alpha1.TimeSeriesPrediction from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:06:24.969883 1 reflector.go:219] Starting reflector v1alpha1.RecommendationRule (10h32m7.567628063s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:06:24.969894 1 reflector.go:255] Listing and watching v1alpha1.RecommendationRule from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:06:24.970076 1 reflector.go:219] Starting reflector v1alpha1.Substitute (10h5m54.19539413s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:06:24.970152 1 reflector.go:255] Listing and watching v1alpha1.Substitute from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:06:24.997040 1 shared_informer.go:270] caches populated I0625 15:06:24.997294 1 reflector.go:219] Starting reflector v1alpha1.EffectiveVerticalPodAutoscaler (9h48m20.736954602s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:06:24.997337 1 reflector.go:255] Listing and watching v1alpha1.EffectiveVerticalPodAutoscaler from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:06:24.997628 1 reflector.go:219] Starting reflector v1alpha1.Analytics (9h44m11.990737419s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:06:24.997683 1 reflector.go:255] Listing and watching v1alpha1.Analytics from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:06:25.069290 1 shared_informer.go:270] caches populated I0625 15:06:25.069344 1 shared_informer.go:270] caches populated I0625 15:06:25.069354 1 shared_informer.go:270] caches populated I0625 15:06:25.069366 1 shared_informer.go:270] caches populated I0625 15:06:25.069389 1 shared_informer.go:270] caches populated I0625 15:06:25.069427 1 shared_informer.go:270] caches populated I0625 15:06:25.069525 1 recommendation_controller.go:41] Got Recommendation default/workloads-rule-replicas-4ckp5 I0625 15:06:25.070209 1 shared_informer.go:270] caches populated I0625 15:06:25.070231 1 shared_informer.go:270] caches populated I0625 15:06:25.070251 1 shared_informer.go:270] caches populated I0625 15:06:25.070326 1 recommendation_rule_controller.go:57] "Got a RecommendationRule resource." RecommendationRule="/workloads-rule" I0625 15:06:25.070375 1 recommendation_rule_controller.go:57] "Got a RecommendationRule resource." RecommendationRule="/services-rule" I0625 15:06:25.070406 1 shared_informer.go:270] caches populated I0625 15:06:25.070426 1 shared_informer.go:270] caches populated I0625 15:06:25.070503 1 shared_informer.go:270] caches populated I0625 15:06:25.070544 1 shared_informer.go:270] caches populated I0625 15:06:25.073513 1 recommendation_controller.go:41] Got Recommendation default/workloads-rule-resource-cw4n6 I0625 15:06:25.097859 1 shared_informer.go:270] caches populated I0625 15:06:25.097859 1 shared_informer.go:270] caches populated I0625 15:06:25.098317 1 shared_informer.go:270] caches populated I0625 15:06:25.098357 1 shared_informer.go:270] caches populated I0625 15:07:36.000394 1 recommendation_rule_controller.go:57] "Got a RecommendationRule resource." RecommendationRule="/services-rule" I0625 15:07:36.008390 1 recommendation_rule_controller.go:416] Mission message: get recommender Service failed, unknown recommender name: Service I0625 15:07:36.008390 1 recommendation_rule_controller.go:416] Mission message: get recommender Service failed, unknown recommender name: Service I0625 15:07:36.008402 1 recommendation_rule_controller.go:416] Mission message: get recommender Service failed, unknown recommender name: Service I0625 15:07:36.008411 1 recommendation_rule_controller.go:416] Mission message: get recommender Service failed, unknown recommender name: Service I0625 15:07:36.008422 1 recommendation_rule_controller.go:416] Mission message: get recommender Service failed, unknown recommender name: Service I0625 15:07:36.008430 1 recommendation_rule_controller.go:416] Mission message: get recommender Service failed, unknown recommender name: Service I0625 15:07:36.008496 1 recommendation_rule_controller.go:416] Mission message: get recommender Service failed, unknown recommender name: Service I0625 15:07:36.008577 1 recommendation_rule_controller.go:317] Updating RecommendationRule services-rule status I0625 15:07:36.019167 1 recommendation_rule_controller.go:341] Update RecommendationRule status successful, RecommendationRule services-rule I0625 15:07:36.019185 1 recommendation_rule_controller.go:100] "Will re-sync" after="5m0s" I0625 15:07:36.019208 1 recommendation_rule_controller.go:57] "Got a RecommendationRule resource." RecommendationRule="/workloads-rule" I0625 15:07:36.035067 1 manager.go:147] RecommendationRule(workloads-rule) Target(default/nginx-deployment): start to run recommender "Resource". I0625 15:07:36.035219 1 manager.go:147] RecommendationRule(workloads-rule) Target(default/nginx-deployment): start to run recommender "Replicas". I0625 15:07:36.041595 1 reflector.go:219] Starting reflector v1.ConfigMap (9h18m32.280147413s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:07:36.041737 1 reflector.go:255] Listing and watching v1.ConfigMap from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:07:36.041979 1 prepare.go:38] RecommendationRule(workloads-rule) Target(default/nginx-deployment): Replicas CpuQuery ReplicasRecommendationCaller-default/workloads-rule-replicas-4ckp5-8f4f8c94-5164-4fc4-92ab-98c865214280/workload_cpu_Deployment_apps/v1_default_nginx-deployment_app=nginx I0625 15:07:36.042249 1 ctx.go:54] "Prom query directly" query="sum(irate(container_cpu_usage_seconds_total{namespace=\"default\",pod=~\"^nginx-deployment-.\",container!=\"\"}[3m]))" I0625 15:07:36.085756 1 prepare.go:51] RecommendationRule(workloads-rule) Target(default/nginx-deployment): Replicas MemoryQuery ReplicasRecommendationCaller-default/workloads-rule-replicas-4ckp5-8f4f8c94-5164-4fc4-92ab-98c865214280/workload_memory_Deployment_apps/v1_default_nginx-deployment_app=nginx I0625 15:07:36.086067 1 ctx.go:54] "Prom query directly" query="sum(container_memory_working_set_bytes{namespace=\"default\",pod=~\"^nginx-deployment-.\",container!=\"\"})" I0625 15:07:36.117458 1 generic.go:80] "Put tuple{query,caller,config} into with channel." query="ReplicasRecommendationCaller-default/workloads-rule-replicas-4ckp5-8f4f8c94-5164-4fc4-92ab-98c865214280/workload_cpu_Deployment_apps/v1_default_nginx-deployment_app=nginx" caller="Replicas%!(EXTRA klog.ObjectRef=workloads-rule, types.UID=feefa461-68a3-400b-bbc8-adb5bf57951a)" I0625 15:07:36.117806 1 ctx.go:54] "Prom query directly" query="sum(irate(container_cpu_usage_seconds_total{namespace=\"default\",pod=~\"^nginx-deployment-.\",container!=\"\"}[3m]))" I0625 15:07:36.141539 1 shared_informer.go:270] caches populated I0625 15:07:36.141594 1 recommend.go:112] RecommendationRule(workloads-rule) Target(default/nginx-deployment): CPU query for resource request recommendation: ResourceRecommendationCaller-default/workloads-rule-resource-cw4n6-9bb027a1-7049-4784-be89-005a4d11207a/container_cpu_default_nginx-deploymentnginx I0625 15:07:36.141651 1 config.go:159] "Made an internal config." internalConfig="{aggregated: true, historyLength: 168h0m0s, sampleInterval: 1m0s, histogramDecayHalfLife: 24h0m0s, minSampleWeight: 1e-05, marginFraction: 0.15, percentile: 0.99, targetUtilization: 1}" I0625 15:07:36.141776 1 prediction.go:166] process analyzing metric namer: ResourceRecommendationCaller-default/workloads-rule-resource-cw4n6-9bb027a1-7049-4784-be89-005a4d11207a/container_cpu_default_nginx-deploymentnginx, config: {aggregated:true historyLength:604800000000000 sampleInterval:60000000000 histogramOptions:0xc00082c3f0 histogramDecayHalfLife:86400000000000 minSampleWeight:1e-05 marginFraction:0.15 percentile:0.99 targetUtilization:1 initMode:history} I0625 15:07:36.142083 1 ctx.go:54] "Prom query directly" query="irate(container_cpu_usage_seconds_total{container!=\"POD\",namespace=\"default\",pod=~\"^nginx-deployment-.\",container=\"nginx\"}[3m])" I0625 15:07:36.167625 1 recommend.go:123] RecommendationRule(workloads-rule) Target(default/nginx-deployment): container nginx recommended cpu 114m I0625 15:07:36.167673 1 recommend.go:127] RecommendationRule(workloads-rule) Target(default/nginx-deployment) Memory query for resource request recommendation: ResourceRecommendationCaller-default/workloads-rule-resource-cw4n6-9bb027a1-7049-4784-be89-005a4d11207a/container_memory_default_nginx-deploymentnginx I0625 15:07:36.167981 1 config.go:159] "Made an internal config." internalConfig="{aggregated: true, historyLength: 168h0m0s, sampleInterval: 1m0s, histogramDecayHalfLife: 48h0m0s, minSampleWeight: 1e-05, marginFraction: 0.15, percentile: 0.99, targetUtilization: 1}" I0625 15:07:36.168002 1 prediction.go:166] process analyzing metric namer: ResourceRecommendationCaller-default/workloads-rule-resource-cw4n6-9bb027a1-7049-4784-be89-005a4d11207a/container_memory_default_nginx-deploymentnginx, config: {aggregated:true historyLength:604800000000000 sampleInterval:60000000000 histogramOptions:0xc00082c5a0 histogramDecayHalfLife:172800000000000 minSampleWeight:1e-05 marginFraction:0.15 percentile:0.99 targetUtilization:1 initMode:history} I0625 15:07:36.168491 1 ctx.go:54] "Prom query directly" query="container_memory_working_set_bytes{container!=\"POD\",namespace=\"default\",pod=~\"^nginx-deployment-.\",container=\"nginx\"}" I0625 15:07:36.195154 1 recommend.go:141] RecommendationRule(workloads-rule) Target(default/nginx-deployment): container nginx recommended memory 120586239 I0625 15:07:36.195671 1 manager.go:205] RecommendationRule(workloads-rule) Target(default/nginx-deployment): finish to run recommender "Resource". I0625 15:07:36.195746 1 recommendation_rule_controller.go:461] Update recommendation default/workloads-rule-resource-cw4n6 I0625 15:07:36.204426 1 recommendation_controller.go:41] Got Recommendation default/workloads-rule-resource-cw4n6 I0625 15:07:36.205839 1 recommendation_rule_controller.go:467] Successfully to update Recommendation default/workloads-rule-resource-cw4n6 I0625 15:07:36.205918 1 recommendation_rule_controller.go:416] Mission message: Success I0625 15:07:36.562205 1 prediction.go:230] "This is not a periodic time series." queryExpr="ReplicasRecommendationCaller-default/workloads-rule-replicas-4ckp5-8f4f8c94-5164-4fc4-92ab-98c865214280/workload_cpu_Deployment_apps/v1_default_nginx-deployment_app=nginx" labels=[] W0625 15:07:36.618335 1 recommend.go:61] RecommendationRule(workloads-rule) Target(default/nginx-deployment): prediction metrics data is unexpected, List length is 0 I0625 15:07:36.618557 1 prediction.go:137] "Unregister a query expression from prediction." queryExpr="ReplicasRecommendationCaller-default/workloads-rule-replicas-4ckp5-8f4f8c94-5164-4fc4-92ab-98c865214280/workload_cpu_Deployment_apps/v1_default_nginx-deployment_app=nginx" caller="Replicas%!(EXTRA klog.ObjectRef=workloads-rule, types.UID=feefa461-68a3-400b-bbc8-adb5bf57951a)" I0625 15:07:36.618698 1 prediction.go:123] "Prediction routine stopped." queryExpr="ReplicasRecommendationCaller-default/workloads-rule-replicas-4ckp5-8f4f8c94-5164-4fc4-92ab-98c865214280/workload_cpu_Deployment_apps/v1_default_nginx-deployment_app=nginx" I0625 15:07:36.619219 1 recommend.go:159] RecommendationRule(workloads-rule) Target(default/nginx-deployment): WorkloadCpuUsage Percentile 0.001967 PodCpuRequest 150 CPUTargetUtilization 0.500000 I0625 15:07:36.619752 1 recommend.go:181] RecommendationRule(workloads-rule) Target(default/nginx-deployment): WorkloadMemoryUsage Percentile 57456640.000000 PodMemoryRequest 524288000.000000 MemTargetUtilization 0.500000 I0625 15:07:36.619916 1 manager.go:205] RecommendationRule(workloads-rule) Target(default/nginx-deployment): finish to run recommender "Replicas". I0625 15:07:36.619928 1 recommendation_rule_controller.go:461] Update recommendation default/workloads-rule-replicas-4ckp5 I0625 15:07:36.626356 1 recommendation_controller.go:41] Got Recommendation default/workloads-rule-replicas-4ckp5 I0625 15:07:36.627378 1 recommendation_rule_controller.go:467] Successfully to update Recommendation default/workloads-rule-replicas-4ckp5 I0625 15:07:36.627395 1 recommendation_rule_controller.go:416] Mission message: Success I0625 15:07:36.627417 1 recommendation_rule_controller.go:317] Updating RecommendationRule workloads-rule status I0625 15:07:36.638573 1 recommendation_rule_controller.go:341] Update RecommendationRule status successful, RecommendationRule workloads-rule I0625 15:07:36.638595 1 recommendation_rule_controller.go:100] "Will re-sync" after="5m0s" I0625 15:07:36.638625 1 recommendation_rule_controller.go:57] "Got a RecommendationRule resource." RecommendationRule="/services-rule" I0625 15:07:36.638723 1 recommendation_rule_controller.go:57] "Got a RecommendationRule resource." RecommendationRule="/workloads-rule" I0625 15:08:24.970904 1 reflector.go:225] Stopping reflector v1alpha1.EffectiveHorizontalPodAutoscaler (9h25m32.854325822s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:08:24.970968 1 reflector.go:225] Stopping reflector v1.ConfigMap (9h18m32.280147413s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:08:24.971038 1 reflector.go:225] Stopping reflector v1.Pod (9h47m21.860077463s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:08:24.971076 1 reflector.go:225] Stopping reflector v1alpha1.RecommendationRule (10h32m7.567628063s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:08:24.971112 1 reflector.go:225] Stopping reflector v1alpha1.Recommendation (10h15m41.897892678s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:08:24.971150 1 reflector.go:225] Stopping reflector v1alpha1.TimeSeriesPrediction (9h11m14.838150609s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:08:24.971169 1 reflector.go:225] Stopping reflector v1alpha1.Analytics (9h44m11.990737419s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:08:24.971200 1 reflector.go:225] Stopping reflector v1alpha1.EffectiveVerticalPodAutoscaler (9h48m20.736954602s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 I0625 15:08:24.971223 1 reflector.go:225] Stopping reflector v1alpha1.Substitute (10h5m54.19539413s) from pkg/mod/k8s.io/client-go@v0.22.3/tools/cache/reflector.go:167 E0625 15:08:24.972469 1 manager.go:430] "problem running crane manager" err="failed to wait for hpa-observer-controller caches to sync: timed out waiting for cache to be synced" F0625 15:08:24.972482 1 manager.go:431] failed to wait for hpa-observer-controller caches to sync: timed out waiting for cache to be synced

maxiaoyin commented 1 year ago

I am so worry, I need to know what happen, and how can i to fix it , thank you very mush

qmhu commented 1 year ago

in 1.26+ version, you should disable autoscaling when starting craned.

Leynakindle commented 1 year ago

in 1.26+ version, you should disable autoscaling when starting craned.

Thank you for the reminder. Additionally, I encountered some issues while attempting to disable autoscaling. I'm using the RKE2 cluster (v1.27.2+rke2r1) for configuration, and I would like to seek your advice on this matter. Thank you very much. Reference links: https://docs.rke2.io/install/configuration, https://docs.rke2.io/reference/server_config.

Some screenshots below: craned log

截屏2023-07-06 18 30 08

cluster info WechatIMG400

config file for rke2-server WechatIMG427

qmhu commented 1 year ago

we are maintaining versions that supporting v1.26+ in branch https://github.com/gocrane/crane/tree/support-1-26. You can build image from that branch.