Aiven-Open / pghoard

PostgreSQL® backup and restore service
http://aiven-open.github.io/pghoard/
Apache License 2.0
1.31k stars 96 forks source link

refactor progress based basebackup metrics #622

Closed sebinsunny closed 3 months ago

sebinsunny commented 4 months ago

This PR refactors the basebackups monitoring introduced in PR #615. Previously, we reset the basebackup progress file whenever a new basebackup request was made, which resulted in not catching a few cases where pghoard restarts. Now, the progress file is only reset when a backup is successful, and we also record the total bytes uploaded in the file for the previous basebackup. If there is a retry due to a pghoard restart or a failed backup request, we check if progress has been made; if it has not exceeded the bytes uploaded in the previous state, we emit a stalled metric. Also, added logging for upload progress for each file and snapshot stages in a basebackup operation.

[SRE-7476]

About this change - What it does

Resolves: #xxxxx

Why this way

codecov-commenter commented 3 months ago

Codecov Report

Attention: Patch coverage is 84.21053% with 3 lines in your changes missing coverage. Please review.

Project coverage is 90.80%. Comparing base (5505b86) to head (56098ae). Report is 11 commits behind head on main.

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/Aiven-Open/pghoard/pull/622/graphs/tree.svg?width=650&height=150&src=pr&token=nLr7M7hvCx&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aiven-Open)](https://app.codecov.io/gh/Aiven-Open/pghoard/pull/622?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aiven-Open) ```diff @@ Coverage Diff @@ ## main #622 +/- ## ========================================== - Coverage 91.01% 90.80% -0.21% ========================================== Files 31 31 Lines 4917 4968 +51 ========================================== + Hits 4475 4511 +36 - Misses 442 457 +15 ``` | [Files](https://app.codecov.io/gh/Aiven-Open/pghoard/pull/622?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aiven-Open) | Coverage Δ | | |---|---|---| | [pghoard/basebackup/base.py](https://app.codecov.io/gh/Aiven-Open/pghoard/pull/622?src=pr&el=tree&filepath=pghoard%2Fbasebackup%2Fbase.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aiven-Open#diff-cGdob2FyZC9iYXNlYmFja3VwL2Jhc2UucHk=) | `92.25% <100.00%> (ø)` | | | [pghoard/basebackup/delta.py](https://app.codecov.io/gh/Aiven-Open/pghoard/pull/622?src=pr&el=tree&filepath=pghoard%2Fbasebackup%2Fdelta.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aiven-Open#diff-cGdob2FyZC9iYXNlYmFja3VwL2RlbHRhLnB5) | `90.87% <0.00%> (-0.34%)` | :arrow_down: | | [pghoard/transfer.py](https://app.codecov.io/gh/Aiven-Open/pghoard/pull/622?src=pr&el=tree&filepath=pghoard%2Ftransfer.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aiven-Open#diff-cGdob2FyZC90cmFuc2Zlci5weQ==) | `94.51% <87.50%> (-1.28%)` | :arrow_down: | ... and [4 files with indirect coverage changes](https://app.codecov.io/gh/Aiven-Open/pghoard/pull/622/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aiven-Open)