Lifecycle context deadline exceeded

(issue originally reported by Anna)

It looks like a performance issue in Lifecycle.

Job logs

failed to call job " latest" by : Server error '500 Internal Server Error' for url 'http://pub:7005/pub/job/***/latest/api/v1/perform' For more information check: https://httpstatuses.com/500: {"error":"Proxy request error: Getting job details: GET request to Lifecycle: Get \"http://lifecycle:7002/lifecycle/api/v1/auth/can-call-job/***/latest/api/v1/perform\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)","requestId":"***","status":"Internal Server Error"}

Lifecycle logs

Din't find any errors in logs. Tracking down the logs by tracing ID, it gives multiple calls of Lifecycle endpoint (for some reason the same tracing ID is being used multiple times).

First request has been processed in 255 ms. However, next call (which supposed to be the same) has been stuck for 73 s (which may have caused the timeout in the Pub client).

Lifecycle metrics

During that time, there was a noticeable spike of active requests and active threads, while maintaining the ordinary number of requests per second and CPU usage.

TheRacetrack / racetrack