borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11k stars 739 forks source link

chunks index caching #8403

Closed ThomasWaldmann closed 4 days ago

ThomasWaldmann commented 1 week ago

borg compact uses ChunkIndex (a specialized, memory-efficient data structure), so it needs less memory now. Also, it saves that chunks index to cache/chunks in the repository.

When the chunks index is needed, it is first tried to get it from cache/chunks and only fall back to building the chunks index via repository.list() (which can be rather slow).

borg check --repair currently just invalidates the chunks cache.

borg create updates the chunks cache.

ThomasWaldmann commented 1 week ago

Code is a bit less pretty now, but more efficient. Also less stats.

codecov[bot] commented 1 week ago

Codecov Report

Attention: Patch coverage is 89.38053% with 12 lines in your changes missing coverage. Please review.

Project coverage is 81.54%. Comparing base (bd6caf8) to head (36e3d63). Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
src/borg/archiver/compact_cmd.py 81.13% 5 Missing and 5 partials :warning:
src/borg/cache.py 95.91% 1 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #8403 +/- ## ========================================== + Coverage 81.44% 81.54% +0.10% ========================================== Files 70 70 Lines 12739 12791 +52 Branches 2311 2318 +7 ========================================== + Hits 10375 10431 +56 + Misses 1707 1703 -4 Partials 657 657 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mirko commented 5 days ago

Is this (also) addressing the issue of listdir() being called for every (sub-)directory in data/ for every borg create-run?

ThomasWaldmann commented 4 days ago

yes!

even if a new client works with a repo the first time, it will fetch that cached index from the repo and use it if it is valid.

borg 1.x had to do a chunks cache sync in that case, building per-archive chunks indexes from all archives and then merging them all into the main chunks index.

ThomasWaldmann commented 4 days ago

@mirko merged this.

also found another issues that it was doing one full repo.list too much, PR incoming soon.

so, master branch should be quite a bit faster now.

only check and compact are expected to always do the repository.list(), just to be on the safe side and not rely on caches.