Closed jwayne closed 5 months ago
Hi @jwayne, we've seen timeouts like this in Cloud Run when containers have their CPU throttled in the middle of an export. When the CPU comes back, enough time has passed that the request times out. Using CPU Always Allocated would probably fix the issue but understand if you don't want to do this.
You mentioned this is ~60/day, do you know the overall error rate for DEADLINE_EXCEEDED in your service? You can also try using an OpenTelemetry collector for trace, but it probably won't completely fix the issue.
Closing this because the customer hasn't responded in a few months.
When running the OpenTelemetry metrics/trace exporters in a Flask service on Google Cloud Run, I've been getting a sizable volume of errors (~60/day) that look like the following:
It looks like these are the result of gRPC calls that timed out (which were triggered by the Cloud Monitoring metrics or trace exporter). FWIW, it's odd that the timeout is being hit in the first place, since I'm running this in GCP.
Two observations:
Glad to hear any suggestions.