borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.19k stars 742 forks source link

borg2: it's coming! #6602

Open ThomasWaldmann opened 2 years ago

ThomasWaldmann commented 2 years ago

update: as there was no negative feedback from alpha testing, borg2 branch was merged into master, thus that big change in form of a major / breaking borg 2.0 release is coming.

read below about what's planned and what's already done.


what could be done if we decide to make a breaking release (2.0) that:

putting all the breaking stuff into 1 release is good for users (1 time effort), but will take quite some time to test and release.

After borg 2.0, we'll make a N+1 release (2.1? 3.0?) that drops all the legacy stuff from the codebase, including the converter for borg < 2.0 repos.

borg 2.0 general comments

DONE: offer a borg transfer command, #6663, that transforms old stuff only to stuff that will still be supported by borg N+1.

N+1 general comments

much of the stuff described here has own tickets, see "breaking" label / add issue links here.

2.0 crypto

N+1 crypto

2.0 repo

N+1 repo

2.0 indexes / cache

N+1 indexes / cache

2.0 msgpack

N+1 msgpack

2.0 archive / item

N+1 archive / item

2.0 or N+1 checksums

2.0 compression

N+1 compression

2.0 upgrade

N+1 archiver

2.0 remote

2.0 cli

2.0 locking

y2038 and requiring 64bit

stuff that is out of scope

as you see above, there is already a huge scope of what should be done.

to not grow the scope even further, some stuff shall not be done (now):

elho commented 2 years ago

I do not mind breaking for the better at all, but some of the outlined details do not qualify for that IMHO.

When it comes to crypto, breakage should not occur to replace one algorithm with a limited life span with another one with a limited life span and thus planning with breakage every few years. Instead breakage should be done to end up with a repo format that does support multiple algorithms and easy and feasable changing of keys as well as used algorithms. That could e.g. be by at least temporarily allowing multiple algorithms to be "active" in a repo at the same time.

When it comes to repo format, a breakage should not be the excuse to just dump a bit of code to still support reading PUTs besides PUT2s, but question the format as a whole and try to address issues such as the current limitations of append-only as well as secure multi-client usage, infeasible (with huge repos) compaction. Ideas here would be:

When it comes to compression, what really should go is the auto mode - or be reimplemented with useful parameters, whcih IMO are hard to come up with in the light of ZSTD performance.

About "scp syntax": On the one hand I think it does not matter much, any sane setup does have wrapper scripts around it to make you only ever see and use the repo URL once in the life of the repo. On the other hand, given the use in scp/rysnc etc. making that non-URL syntax so much more common to users, plus that while the code handling things leaves a lot room for improvement, a lot of that has nothing to do with the non-URL syntax as such.

ThomasWaldmann commented 2 years ago

Crypto:

AES-CTR does not have a limited timespan. Why we are doing this is to get rid of the fundamental counter management issues:

There's also a slight ugliness of only storing a part of the IV within the old format, but that is just a minor detail.

The new AEAD algorithms with session keys solve that.

We could have all 3 crypto algorithms in parallel in the borg code (but currently not in same repo), but there are other things on the above list that are best solved with tar-export/import or borg transfer and a new repo and IF ones does that anyway, one can as well go for the better crypto in one go (instead of having to do the export/import again some time later).

I don't think it would be a good idea to use different encryption algorithms in the same repo and especially not with the same key - so if we would go for the complexity of supporting repos with that, we would need multiple (master) keys for one repo, making it more complex for borg and also for the users.

You also can't just "change the keys / algos" in the same repo. Due to dedup, a lot of data would be still encrypted by old key and old algorithm. To get really rid of it you'ld need some global migration, touching a lot of data and needing some management for the case of interruptions of that process. That's about as much I/O and time needed as the export/import, just with much more complexity.

ThomasWaldmann commented 2 years ago

Repository:

It's not just about the "reading PUTs" - it is at quite some places, including borg check (which is already quite complex).

I can imagine doing some more and even radical changes to the repo format if we re-start with new repos and require export/import anyway. I am not too happy with the complexities of segment file handling either.

In the end this will depend on some developers architecting and implementing it though and we should try to not make the scope too big though or it'll never get releasable.

Repos: interesting ideas. Needs more analysis I guess, esp. since we likely want to keep the transactional behaviour and maybe also the LOG like behaviour.

Segmentless repos: if everybody had a great repo filesystem and enough storage, I guess that could be done (but it would mean that if the source has a million files, the repo could have XX million chunks). Super simple for borg, but a huge load on the repo fs (did that within my zborg experiment back then). Could also be quite slower due to more random accesses and more file opening and use a lot more space due to fs allocation overheads if one has a significant amount of small files.

Cloud storage: I don't want to maintain such code myself, that's just a rabbit hole I don't want to get into. So, for me it is "local directory" as the repo (plus some method of remoting that, not necessarily the hard to debug current remote.py code).

ThomasWaldmann commented 2 years ago

Compression: auto mode should go? do we have a ticket about that?

ThomasWaldmann commented 2 years ago

@elho thanks for the detailled feedback btw!

This ticket is primarily meant for the to-break-or-not-to-break decision. Once we decide to do a breaking release, requiring new repos, key, export/import, we can do a lot of changes and need to discuss the details in more specific tickets.

We should somehow try to limit the scope though, so it won't take forever.

RonnyPfannschmidt commented 2 years ago

@ThomasWaldmann if instead of segments something like git pack's could be used, then with the new encryption session stuff it may even turn feasible to push packs instead of archives between repos without necessarily requiring de/encryption

RonnyPfannschmidt commented 2 years ago

Potentially this would also enable potentially dumb remotes like s3, sshfs, with the caveat of having more pain with post prune gc and repacking

ThomasWaldmann commented 2 years ago

@RonnyPfannschmidt encrypted chunks can be transferred between related repos using the same key material, there is a ticket about that already. I don't know the git pack format, so not sure how that is relevant for (re-)encrypting. But if we want to transfer a full "pack", there might be requirements due to that (opposed to just transferring a single chunk).

elho commented 2 years ago

I would be happy with a borg1.3 that on first use of serve on (or direct local access to) a v1 repo would start out (maybe after some confirmation) by iterating over all segments, for each creating a new replacement segment file, filling it with the same content except for using PUT2 whenever a PUT is read from the old one, doing some sort of verify pass maknig sure the new segment as arrived on disk has the same data as the old one and only then atomically mv the new over the old one. When having done the last segment file without being interrupted, switch repo version from v1 to v2. No other command or code path would need to support v1 and PUT in that scenario.

ThomasWaldmann commented 2 years ago

Note: I updated the topmost post with feedback from you all (thanks!) and also with new insights. I also edited some other posts to remove duplicate / outdated information to keep this issue short.

ThomasWaldmann commented 2 years ago

Progress in #6663 and #6668 looks quite good.

About version: if we require people to transfer their repos using borg transfer, guess that must be borg 2.0 because you can't just continue with an existing repo as it is.

So, if we merge these, next release from master will not be 1.3, but 2.0.

horazont commented 2 years ago

not sure if we can already do that. a lot of platforms already dropped 32bit support, but for some this is still in the works (e.g. SBC like the raspberry pi).

I think especially SBCs will stay 32bit for a while, because the savings in having a smaller pointer width are relevant on low-memory platforms.

Aren't there clock system calls which return a 64-bit wide integer even on 32-bit ABIs?

ThomasWaldmann commented 2 years ago

Well, it's not just like borg needs to get the 64bit time by doing a call, it rather is the whole system of kernel / libc / python needing to work with timestamps of reasonable length. E.g. timestamps in os.stat output, python time and datetime stuff, etc.

elho commented 2 years ago

So, if we merge these, next release from master will not be 1.3, but 2.0.

Changing the module name from borg to borg2 at this point is something to be thoroughly considered.

Both, to eventually play with potential (meanwhile obsoleted already) export/import tar magic, but also to be able to test 1.2 in parallel with 1.1 in production across all my systems in a sane manner, I went on the surprisingly painful adventure to create myself a variant of the distribution's package that can be installed and used in parallel with the stock 1.1 one. In a hackish manner, one could install borg below a different path, but that is nothing any distribution would do, I went the painful way to do such a rename in there. (IOW happy to clean that up and even break out some of the cases where absolute imports were used without need and against the common practice in most other similar places in the code).

For the original idea of export-import migration this would be a requirement, here it is not, but in practice, for people backing up to multiple repos, scenarios like migrating the local one to 2.0 while still waiting an undefined time for the borg storage provider the external one resides on to support 2.0 could be very common.

ThomasWaldmann commented 2 years ago

Guess it is not just about the module name, but also the cli cmd name. OTOH, I'ld dislike to put the version number into the cli cmd name.

For testing, one could also use the fat binary and rename that to borg2.

elho commented 2 years ago

Guess it is not just about the module name, but also the cli cmd name. OTOH, I'ld dislike to put the version number into the cli cmd name.

It is, but the command name is something that can just be changed without requiring any modification of the command itself to keep it working, and on the other hand is something distributions have support for. E.g. in Debian, a borg2 package would ship borg2 etc. comnands, but (along with a packaging update to the 1.x version to be shipped in parallel) make use of the alternatives system of managed symlinks to have borg commands available to the user that point to whichever version is installed on its own, to (probably best for compatibility) borg1 if both are installed, with the option for the user to easy switch that (along with the corresponding manpages) according to his preference.

Aware wrappers that censequently have an idea of the configured repo(s) being version 1 or 2 would know to invoke according versioned command name in all cases.

For testing, one could also use the fat binary and rename that to borg2.

Testing as in "is this for me" or "does this work at all", yes. But not for testing as in "let me run this in parallel to 1.1 for a couple months and see whether any issues arise before ditching 1.1", ie. a point where 1.2 can be regarded to be at currently.

horazont commented 2 years ago

Well, it's not just like borg needs to get the 64bit time by doing a call, it rather is the whole system of kernel / libc / python needing to work with timestamps of reasonable length. E.g. timestamps in os.stat output, python time and datetime stuff, etc.

The statx syscall already has 64-bit wide timestamps (it uses __s64 for the seconds instead of time_t). Since kernel 5.1, 64-bit wide time structs are available on a bunch of other system calls.

So the kernel can (probably; I saw patches for utimes64, not sure if those have been applied, it hasn't been mentioned in that post above) do it.

I'm not sure what the current status is on the glibc side of things (the page looks a bit unclear on progress), but it may be worth pushing python on 32bit architectures to use it if glibc is ready.

All I'm saying: don't drop support for 32-bit architectures, but go for dropping support for 32-bit timestamps, which don't have to be the same thing anymore this time and age.

ThomasWaldmann commented 2 years ago

Note: i updated the top post with the current progress and also released 2.0.0a3 - if no one is holding me back with negative testing results, I'll soon merge the borg2 branch into master.

ThomasWaldmann commented 2 years ago

as there was no negative feedback from alpha testing, i just merged the borg2 branch into master. 🚀

keeping this issue open until N+1 for the misc. remaining TODO.

xeruf commented 2 years ago

Is there any overview of what borg2 improves for me as a user? How usable is it?

ThomasWaldmann commented 2 years ago

IIRC I did not write a short overview yet, so there's what you can read in the change log and in the top post of this ticket.

The super short overview is "we fixed most issues labelled as BREAKING", they often were long-term open issues (sometimes since attic) because fixing them breaks compatibility.

See there: https://github.com/borgbackup/borg/issues?q=label%3Abreaking

2.0.0b1 should be pretty usable, just do not run it against production repos (rather use copies to experiment).

ThomasWaldmann commented 2 years ago

@xeruf see there: https://github.com/borgbackup/borg/issues/6956

RubenKelevra commented 2 years ago

What about using zstd dictionaries to get the compression ratio up? :)

ThomasWaldmann commented 2 years ago

@RubenKelevra do you have an idea about how exactly would that work inside borg?

RubenKelevra commented 2 years ago

@ThomasWaldmann sure:

Rationale behind the last step is: zstd archives can select the used dictionary for decompression by a byte (as an minimum size identifier). Since on the block level it's probably pretty tricky to get the mime type before decompressing the file, it's probably best to let zstd choose the correct dictionary by an identifier stored in each block (takes up one byte).

This becomes important if the same data is found in different types of files. Say a tar archive contains blocks of a JSON file.

The mime type is in this case no longer helpful, but decompression is still possible.

ThomasWaldmann commented 2 years ago

@RubenKelevra well, I see what you mean, but that is not how "borg create" works.

But maybe check the issue tracker if we have a ticket about this and if not, create a new one, so we can collect ideas there.

RubenKelevra commented 2 years ago

@RubenKelevra well, I see what you mean, but that is not how "borg create" works.

Interesting, can you elaborate or point me to the part which is different than I think, so I can take a look? 🤔

But maybe check the issue tracker if we have a ticket about this and if not, create a new one, so we can collect ideas there.

Will do

arodland commented 1 year ago

There shouldn't be any need to drop 32-bit support to be y2038-clean. 32-bit platforms can still have a 64-bit time_t, and most of them do, and have done for 5-10 years at least.

enkore commented 1 year ago

Have there been any major complaints / pain points with the JSON API? The only things I've found are

(a) (largely hypothetical) encoding woes when involving file names (obviously file names don't have to be representable in unicode regardless of locale) and on weird systems (#2273) and (b) it's annoying to parse when stdout and stderr are multiplexed, because stdout uses pretty printing (#6053, #3605)

ThomasWaldmann commented 1 year ago

@enkore the json encoding issues for e.g. path and also some other things that can not be represented as valid unicode (== without surrogate escapes) were solved some months ago, e.g.:

Especially on samba servers this is not at all hypothetical, but a very practical issue, because the servers existing since some decades already collected all sorts of historical path encodings.

issmirnov commented 1 year ago

@ThomasWaldmann my vote is on delaying the release and only doing one breaking change. Otherwise, your users will have to migrate v1-v2 with breaking changes, and then within a "short" time (6-12 months?) have to migrate v2-v3. Some users will be on v1, so you'd also have to build out v1-v3 upgrade paths and checks.

Borg v1 works great, we've waited this long, we can wait a little longer to just have to pay the pain of migration once.

Everyone, feel free to thumbs up / thumbs down this comment to express your opinion.

tmm360 commented 1 year ago

I think that all is a matter of timing. How much is "a lot"? If is 6 months, merge them. If it is a fundamental rewrite and will take 2 or more years to be stable, do two separate releases.

knutov commented 1 year ago

my two cents: if it's ready - it's ready.

some changes will happen eventually, there is no problem to do small updates in scripts.

New version has a lot of benefits, why wait to us it?

ThomasWaldmann commented 1 year ago

@tmm360 What I have in mind is a big change (not even sure how big), my and other contributors' free time is a bit hard to predict, so it makes the overall time needed somehow unpredictable.

Maybe forking off some new borg-ng branch from master and just starting that development there, while fixing bugs and missing stuff in master branch would be an option. Depending on more insights developing over time, a release could be made from either branch.

tmm360 commented 1 year ago

@ThomasWaldmann at this point I've no doubt it should be another release, and keep time to develop it without need to hurry. It looks something of huge, and if borg2 is ready, my idea it should be released as is.

darkk commented 1 year ago

Speaking of pro and contra I'd also add that migration might also have a desirable side-effect of backup verification.

However, I understand that the process might require twice as much of storage space under certain conditions.

RafaelKr commented 9 months ago

was 2.0 put on hold?

The commit history tells it is actively worked on: https://github.com/borgbackup/borg/commits/master/

ThomasWaldmann commented 9 months ago

@j1warren As my work on borg2 will likely take quite a bit longer, I temporarily switched focus to borg 1.x and made a "refresh" there in form of borg 1.4 (currently beta 1), see #7975 for more details. Will continue to work on borg2 soon.

ThomasWaldmann commented 2 months ago

8332 has some more radical changes needing review.

@elho maybe have a look, close to your 3rd item in https://github.com/borgbackup/borg/issues/6602#issuecomment-1100639125 .

The borg 2.0 code will still need to deal with reading borg 1.x archives for borg transfer to migrate them into borg 2 repo, thus we have to be a bit careful not to tear down some stuff we still need.

struthio commented 2 months ago

The borg 2.0 code will still need to deal with reading borg 1.x archives for borg transfer to migrate them into borg 2 repo, thus we have to be a bit careful not to tear down some stuff we still need.

Can it be released in two steps. Step 1. Borg2 with new archive format etc is ready, but only for new repositories. Step 2. Compatibility with borg1 archives

ThomasWaldmann commented 2 months ago

@struthio well, the compatiblity code is just the old code, so it is already present. And unit tests say, that borg transfer works in my #8332 PR, so no need for 2 steps.

ThomasWaldmann commented 1 month ago

8332 experiment was successful (AFAIK) and was merged into master, I will update top post here accordingly.

everybody can help beta testing this huge change in 2.0.0b10+.

ThomasWaldmann commented 1 month ago

Via the borgstore rclone backend, borg just got cloud storage support (for 100+ cloud storage providers).