We had a site outage due to Gerrit being unresponsive, so this PR makes our dependency on Gerrit more robust.
The existing code had two levels of caching: redis for 1 hour, and a copy of the Gerrit response in ndb that is marked with its creation time and only used for 1 hour. Both are useful even while Gerrit is working because we sometimes clear redis.
In this PR:
Break out a separate is_fresh() method to check whether the ndb value should be used, rather than returning None.
Use the ndb value if it is fresh.
In the case of an error fetching from Gerrit, use the ndb value even if it was stale. Also, treat it like a new response, so it is stored to ndb again and considered fresh for the next hour. This prevents a huge number of requests to Gerrit during an outage.
If Gerrit is down, and we have no OwnersFile in ndb, return an empty list of approvers rather than raising an exception. This will lock out non-admin API owners, but our site will still be usable.
We had a site outage due to Gerrit being unresponsive, so this PR makes our dependency on Gerrit more robust.
The existing code had two levels of caching: redis for 1 hour, and a copy of the Gerrit response in ndb that is marked with its creation time and only used for 1 hour. Both are useful even while Gerrit is working because we sometimes clear redis.
In this PR:
is_fresh()
method to check whether the ndb value should be used, rather than returning None.