Aiven-Open / pghoard

PostgreSQL® backup and restore service
http://aiven-open.github.io/pghoard/
Apache License 2.0
1.32k stars 97 forks source link

Rommellayco make stale seconds configurable #630

Closed RommelLayco closed 1 week ago

RommelLayco commented 1 month ago

About this change - What it does

Implement a exponential backoff when stalling on backup download

Why this way

tkren commented 3 weeks ago

There is https://github.com/Aiven-Open/rohmu/pull/194 and https://github.com/Aiven-Open/pghoard/pull/631 which I think probably already solve most of our problems.

RommelLayco commented 3 weeks ago

closing cause other PR solve this problem better

RommelLayco commented 3 weeks ago

i gonna re open cause we could still get timeout errors and it is better to retry with longer timeouts

orange-kao commented 2 weeks ago

The conflict is caused by the merge of #631. After fixing conflicts, I have tested this in dev env and LGTM.

self.max_stale_seconds can double as many times as the self.stall_max_retries allows, and it can become quite large. I assume that if us (or other pghoard users) use the restore progress for monitoring and alerting purpose, it should not be a problem?

orange-kao commented 2 weeks ago

Thanks for fixing the conflict. I assume it should be self.max_stale_seconds = min(self.max_stale_seconds * 2, 480) ? (min, not max. Otherwise self.max_stale_seconds will jump from 120 to 480, and then 960...)