GoogleCloudPlatform / esp-v2

A service proxy that provides API management capabilities using Google Service Infrastructure.
https://cloud.google.com/endpoints/
Apache License 2.0
266 stars 167 forks source link

Frequent error logs on GCP API Gateway #709

Open erhan-talarian opened 2 years ago

erhan-talarian commented 2 years ago

Hi, I have an App Engine Standard Java 11 application served behind a GCP API Gateway. When I check the Cloud Logs I frequently see these errors like the following from API Gateway:

{
insertId: "5e1e0f456172eda03520c9fae065f312-1@a1"
jsonPayload: {
api: "//apigateway.googleapis.com/projects/abc/locations/global/apis/abc"
apiConfig: "//apigateway.googleapis.com/projects/abc/locations/global/apis/abc/configs/abc"
httpRequest: {
duration: "0ms"
hostname: "servicecontrol.googleapis.com"
httpVersion: "HTTP/1.1"
path: "/v1/services/abc.apigateway.abc.cloud.goog:report"
requestSize: "5003"
responseSize: "95"
status: 503
}
serviceConfig: "//servicemanagement.googleapis.com/services/abc.apigateway.abc.cloud.goog/configs/abc"
}
logName: "projects/abc/logs/apigateway.googleapis.com%2Fservice_control_queries"
receiveTimestamp: "2022-06-20T13:19:37.047238698Z"
resource: {
labels: {
gateway_id: "abc"
location: "us-central1"
resource_container: "projects/abc"
}
type: "apigateway.googleapis.com/Gateway"
}
severity: "ERROR"
timestamp: "2022-06-20T13:19:29.777308345Z"
}

What do these errors mean, should I be concerned?

Thank you

TAOXUY commented 2 years ago

This means Google's service control service is down(503). How often do you see it? Is it good now?

erhan-talarian commented 2 years ago

I checked the logs from last 14 days and 9 of them has occurrences of this error. Minimum is 1 error per day and max is 65 errors per day (API average is 3 requests / sec). Normally I wouldn't be bothered but I suspect some requests somehow never reach the App Engine application (no logs/errors) and API GW responds with HTTP 500 (sometimes 502) directly, that's why I am a little bit concerned.

qiwzhang commented 2 years ago

For the errors with path = "/v1/services/abc.apigateway.abc.cloud.goog:report", these are less severe than path = "/v1/services/abc.apigateway.abc.cloud.goog:check". The former is called at the end of requests,to send telemetry info, their failures cause the request data not showing in the graph, not showing access logs.

The latter is for checking access control, such as api-key, their failures may reject the requests and cause the requests not reaching to your app-engine applications, could you check to see if there are such errors.

TAOXUY commented 2 years ago

The error for failing calling /v1/services/abc.apigateway.abc.cloud.goog:report happens on log path not request path which won't effect your service availability, so it shouldn't some other problems if the requests are not forwarded to your backend.

The SLO for report is 99.9%. I think 65 per day for 3QPS looks good to me.

erhan-talarian commented 2 years ago

There are also errors for "/v1/services/abc.apigateway.abc.cloud.goog:check", but less frequent (3 days out of last 14, max 17 errors/day). However, the timings for these errors don't match the requests I suspect of not reaching App Engine, so I guess they are "innocent".