Open HugoTigre opened 4 years ago
We've been seeing the same issue when using this adapter and autoscaling based on pubsub undelivered messages
We facing this use as well, same use case.
same here.
we have HPA on most of our services based on custom metric (external). GKE version v1.17.15-gke.800 and gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.8.0
it is working but we have a lot of errors on GKE events from the kind:
unable to fetch metrics from external metrics API: the server is currently unable to handle the request
on the custom metrics log the log is pretty not useful as its just FULL with the following:
apiserver was unable to write a JSON response: http2: stream closed
apiserver received an error that is not an metav1.Status: http2: stream closed
i've notices once this custom-metrics-stackdriver evicted and restarted we got the unable to handle request error, but also when its just running every few hours or minutes we get the errors and the hpa works but i suspect its not working as efficient as it used to be.
BTW, same happens on another cluster GKE version v1.17.14-gke.1600 and gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.10.2
any idea what's going on? thanks
anything here?
same issue here
We are trying to use HPA with the same metric as @JBodkin-LH and I'm getting a lot of those errors, it seems the metrics are working fine, but that amount of error logs might hide other issues...
Is there a fix for this or a way to silence these logs? we've already run into a surge in costs due to this spamming issue.
For Google Cloud, it's possible to set Logs Exclusion for a specific pattern: https://cloud.google.com/logging/docs/exclusions
same issue here
We setup the HPA following this guideline https://cloud.google.com/kubernetes-engine/docs/tutorials/autoscaling-metrics
any idea what's going on?
Any update?
Update ?
I'm also running into this issue :-(
Same for us. Any reaction from maintainers?
I am seeing the same issue as well following the guide found here https://github.com/GoogleCloudPlatform/k8s-stackdriver/tree/master/custom-metrics-stackdriver-adapter. Anyone figure out a way to fix the error messages above or is this something we can ignore?
Running v1.21.5-gke.1302 for control plane and nodes with workload identity enabled.
Same for me. Any update?
+1
Getting surprise spam cloud logging bills from this issue, except this is autodeployed as part of Cloud Composer.
@muscovitebob please reach out to cloud support for any issues caused by a managed product and related billing issues.
In general when managing this component yourself, check your adapters memory utilization. If it is running close to the memory limit this can be a symptom. Also check the resources provided to the adapter in general and see if increasing them reduces the frequency of these errors (feel free to share learnings here).
If you are not seeing any data reaching the apiserver from the component, checking your networking rules/firewalls can also help to find what is causing traffic to get lost. Often these errors just mean the adapter can not respond in time or at all.
Same issue with stackdriver version gcr.io/gke-release/custom-metrics-stackdriver-adapter:v0.13.1-gke.0
and k8s version 1.23.14-gke.401
+1
Hi, we experience the same issue in two different environments. This produces ~10.000 error messages pr hour. This drowns any useful error message and causes higher than neccessary costs. Quite an important issue so to say. Quite disappointing to see that has not been solved in 2 1/2 years, and is not more prioritised! Workaround for our application is to go back to composer version 1. We are happy to provide more information if anybody is willing to take on this issue. "old prod env":
Steps to reproduce the issue:
gcloud projects add-iam-policy-binding kolumbus-atl-prod \
--member=serviceAccount:service-123456789@cloudcomposer-accounts.iam.gserviceaccount.com \
--role=roles/composer.ServiceAgentV2Ext
gcloud composer environments create kolumbus-composer5 \
--location=europe-west1 \
--image-version=composer-2.2.0-airflow-2.5.1 \
--environment-size=small \
--maintenance-window-start='2023-05-25T17:30:00Z' \
--maintenance-window-end='2023-05-25T21:30:00Z' \
--maintenance-window-recurrence='FREQ=DAILY'
I don't think GCP teams look at or are notified of or maybe just don't care about GitHub.com comments and issues. The most effective way to get them to fix things is to create a partner issue on their internal tracking system or GCP support case if you're a paying GCP customer (paying more money correlates to faster response time) and linking back to this issue.
Well I have bad news, they don't care even you pay them 🤧 gcp seems to be losing to other to key players.
On Thu, May 25, 2023, 22:25 David Xia @.***> wrote:
I don't think GCP teams look at or are notified of or maybe just don't care about GitHub.com comments and issues. The most effective way to get them to fix things is to create a partner issue on their internal tracking system or GCP support case if you're a paying GCP customer (paying more money correlates to faster response time) and linking back to this issue.
— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/k8s-stackdriver/issues/318#issuecomment-1563223193, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACIECJTHV6TVVZZM7SFPG7DXH6FJFANCNFSM4L2Y5CQA . You are receiving this because you commented.Message ID: @.***>
I added a note to the corresponding composer bug report: https://issuetracker.google.com/issues/159171905 Please upvote and comment you, too.
I'm currently using
Horizontal Pod Autoscaler
(in google cloud) implemented with custom metrics, socustom-metrics-stackdriver-adapter
is installed from hereThe problem is that it's generating more than 10 log messages a second with the following errors:
and
The HPA is working as expected, so the amount of errors is very strange and I couldn't found a reason for it, not could I find documentation on how to change this, or even change the amount of requests periodicity, not in HPA nor in this adapter.
HPA is configured as follows:
Kubernetes version is: 1.15
Is there any reason for this. It looks like a bug.
Also this issue seems to be related: https://github.com/GoogleCloudPlatform/k8s-stackdriver/issues/303