Closed MatthewL246 closed 1 month ago
Borg does not yet have such a feature, but guess it would be possible to implement the space-usage analysis.
It is not possible to analyse the time spent for backing up some file/dir, we only have the overall backup time for a backup archive, but no more fine-granular timing data.
Implementation notes:
-a
, --last N
, etc.) - these can be reused/extended.O(N_archives_considered * archive_size)
Since it sounds like individual file timing isn't implemented, I made a quick Python script that ranks directories on their backup times in case anyone else finds that useful. It requires a timestamped backup log, which can be generated with borg create --list ... | ts -s "%.s" | tee borg_log.txt
.
from collections import defaultdict
path_backup_times = defaultdict(float)
with open("borg_log.txt", "r") as file:
previous_timestamp = 0
for line in file:
parts = line.split()
if len(parts) >= 3:
timestamp = float(parts[0])
file_flag = parts[1]
file_path = " ".join(parts[2:])
# See https://borgbackup.readthedocs.io/en/latest/usage/create.html#item-flags
if file_flag in ["A", "M", "U", "C", "E"]:
backup_time = timestamp - previous_timestamp
path_components = file_path.split("/")
for i in range(1, len(path_components) + 1):
component = "/".join(path_components[:i])
path_backup_times[component] += backup_time
previous_timestamp = timestamp
sorted_paths = sorted(path_backup_times.items(), key=lambda x: x[1], reverse=True)[:20]
for rank, (path, backup_time) in enumerate(sorted_paths, start=1):
print(f"{rank}. {path} ({round(backup_time)}s)")
Idea: borg2 compact needs to read all archives anyway, so could compute some stats as a side effect.
Related: #71
borg2 beta 12 now has "borg analyze", read the docs about what it does precisely.
Have you checked borgbackup docs, FAQ, and open GitHub issues?
Yes.
Is this a BUG / ISSUE report or a QUESTION?
Feature request.
System information
N/A
Feature request
I think it would be useful if Borg could generate a list showing which files and directories have been using the most storage space (after compression and deduplication) in a repo within a certain time period (such as in the last month). This would be helpful for finding directories that are wasting space in the repo and the user might have accidentally forgotten to exclude.
My inspiration for this is the git-filter-repo
--analyze
option, which creates a report of which files in a Git repo have used the most space throughout the repo's history. Aborg analyze
command could look something like that.Example `git-filter-repo` analysis for the Borg repo
``` === All directories by reverse size === Format: unpacked size, packed size, date deleted, directory name 744008017 26787700It would also be interesting to see a feature that does something similar for "time spent backing up" instead of storage used, although I don't know if that would be feasible.