Open jeffbl opened 2 months ago
Looks like problem is that container memcached
simply was not running, so when the orchestrator was asking for cached data, there was no response, and it timed out each request. With queries from different run groups, this meant a number of non-overlapping timeouts, and thus a very slow response time. Starting memcached
container solves problem. Questions:
memcached
doesn't seem to log any output, so not sure if it is a problem in the container that caused a crash, or if it simply failed to start when we rebooted after the power outage last week.memcached
goes bad? The irony here is that the sole purpose of caching is to speed things up, but in this case, it slowed things waaaaay down. :)The only thing that comes to mind is that we check if the service name used is a running docker service on the same network before attempting to connect, either on every attempt or the first. Depending on how the request to the cache is implemented, we could also reduce the timeout, but if this is buried in a library it may not be possible.
Discussion from this week's meeting:
@VenissaCarolQuadros was using pegasus, and queries were either failing or coming back very slowly. Looks like problem with memcached: