Closed RommelLayco closed 1 week ago
There is https://github.com/Aiven-Open/rohmu/pull/194 and https://github.com/Aiven-Open/pghoard/pull/631 which I think probably already solve most of our problems.
closing cause other PR solve this problem better
i gonna re open cause we could still get timeout errors and it is better to retry with longer timeouts
The conflict is caused by the merge of #631. After fixing conflicts, I have tested this in dev env and LGTM.
self.max_stale_seconds
can double as many times as the self.stall_max_retries
allows, and it can become quite large. I assume that if us (or other pghoard users) use the restore progress for monitoring and alerting purpose, it should not be a problem?
Thanks for fixing the conflict. I assume it should be self.max_stale_seconds = min(self.max_stale_seconds * 2, 480)
?
(min, not max. Otherwise self.max_stale_seconds
will jump from 120 to 480, and then 960...)
About this change - What it does
Implement a exponential backoff when stalling on backup download
Why this way