borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.09k stars 739 forks source link

borg check --archives-only --last-n-seconds #7062

Closed magma1447 closed 1 year ago

magma1447 commented 2 years ago

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Feature request

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

1.2.2

Operating system (distribution) and version.

Debian 10 / 11

Hardware / network configuration, and filesystems used.

Linux software raid + mergerfs + ext4

How much data is handled by borg?

Approximately 10 TB

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg check --archives-only --last-n-seconds 864000

Describe the problem you're observing.

I have a borg repository to which more than one server sends backups. These servers are VM (kvm/libvirt) servers which handles multiple virtual machines. These are snapshoted and the images are sent to borg.

At times I want to run a check on the repository to make sure that it works as intended. I then run a borg check --repository-only, and want to follow that with a borg check --archives-only, which should include all recent archives. The problem is that borg only supports "last n" backups, and I do not know how many that is.

I would be grateful if a --last-n-seconds (name not important) was implemented, that could be used to run a check on all archives created the last week for example.

My current workaround is to first make a list, send that to jq which does some magic and returns to me the number of archives that are new enough. Example: borg list --json <repository> |jq '[ .archives[] | select(.start |split(".")[0] |strptime("%Y-%m-%dT%H:%M:%S") |mktime > now - 10*86400) ] |length'

Then I run borg check --archives-only --last <number from above>.

My workaround is a bit cumbersome and introduces a dependency on jq. There is also the case where another server could start a backup between the list and check commands. That could probably be solved by borg with-lock, but it makes it even more complex. Letting borg handle it itself sounds like a better looking solution.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Yes, it's a feature request. :)

Include any warning/errors/backtraces from the system logs

No logs neccessary.

ThomasWaldmann commented 2 years ago

Maybe extending the new --match-archives in borg2 would be a possibility.

Its current id:, sh:, re: patterns do differrent sorts of archive name matching, but maybe archive creation time based match types could be added.

Michael-Girma commented 1 year ago

@ThomasWaldmann Can I pick up this issue?

ThomasWaldmann commented 1 year ago

@Michael-Girma Sure!

Michael-Girma commented 1 year ago

@ThomasWaldmann I see two ways of implementing this:

One of the Methods is to provide a period: tag like id: and sh: within the --match-archives flag. This will overshine the existing tags unless we allow multiple tags to be used (Which doesn't seem like the best option since one might want to use both name and date matching logic). This would possess a spec of something like period:22/11/22 for archives starting from specified date | period:22/11/22-30/11/22 for archives created between specified dates | period:-22/11/22 to get archives with creation date up-to specified date. We could also add options like period:7d to get archives created after 7 days ago.

The other approach is to introduce another flag, something like --creation-period that would follow a spec similar to --match-archives with acceptable values like after:11/22/22 | before:22/22/22 | after:11/22/22 before:22/22/22. We can also add in relative markers like after:-7d which would get archives created after 7 days ago.

What would you suggest?

ThomasWaldmann commented 1 year ago

Hmm, I don't see the period (like absolute date from/to) pattern would be too useful. It might cover some rare one-time use, but it won't be useful for scripting i guess.

Also, thinking about it, we'll often need --match-archives to select a name prefix (if multiple backup sequences go into the same repo). So, as long as there is only one --match-archives supported, we can't use that.

But we already have --first and --last (giving first/last N archives, by count). We could either extend that from N (count) to also support e.g. Nd (N days) and also other time units.

Or, maybe simpler, we could add --oldest Nd and --newest Nd(and also --older Nd and --newer Nd):

I added a bounty for this: https://app.bountysource.com/issues/111823907-borg-check-archives-only-last-n-seconds

Michael-Girma commented 1 year ago

@ThomasWaldmann Done with my first pass of changes to resolve this issue and currently writing test cases. However, I couldn't find an in-house method to create archives with custom metadata (mainly to alter the date) which I can later use within my test cases. Have I overlooked an approach to set the date for newly created archives from within the test suite?

ThomasWaldmann commented 1 year ago

Search for --timestamp in the tests.

ThomasWaldmann commented 1 year ago

@magma1447 last N days or months would also work or is there a specific use case for shorter/longer time units?

magma1447 commented 1 year ago

@ThomasWaldmann For me personally days would be just perfect. I am currently using 30 days with the jq hack I posted in my original post. Though that post used 10 days.

I have a hard time imagining a realistic scenario where the precision of hours or less would be helpful to anyone. The only reason I originally suggested the parameter to be in seconds is because it's the shortest unit normally used by software.

ThomasWaldmann commented 1 year ago

@magma1447 ok!

i asked because PR #7272 currently only implements months and days, but guess that should be ok then.