borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.2k stars 743 forks source link

compatibility policy / support timeframe #26

Closed anarcat closed 5 years ago

anarcat commented 9 years ago

one of the main contentious points of #1 is whether borg should be backwards-compatible with attic, or how/if backwards-compatibility should be broken within borg itself.

so to clarify this, i wish to open a discussion specifically about this topic. this is basically a continuation of https://github.com/jborg/attic/issues/215 and #1.

the original proposal from @ThomasWaldmann was:

  • Don't break it accidentally / without good reason / without warning.
  • Break it if above does not apply. needs more thoughts/discussion
  • As the fork is "new software" from the perspective of a Borg user or a Borg packaging distribution, there is no past we need to stay compatible with - we have the chance to break compatibility and change everything that we think needs changing.
  • Over time, we'll have more users and incompatible changes get harder.
  • Avoid getting into the "compatible forever" trap - we should maybe not assure compatibility of development versions nor spanning major releases.
  • When used for long-term archiving, special considerations and care are required. E.g. a development snapshot of Borg might be not the right thing for this. Also, Borg exists to be able to change things. So if you don't like or can't live with a changing software, don't use it.

I #25, i make an entry in the documentation (the FAQ) that summarizes the points from #1 as:

borg intends to be:

  • simple:
    • as simple as possible, but no simpler
    • do the right thing by default, but offer options
  • open:
    • welcome feature requests
    • accept pull requests of good quality and coding style
    • give feedback on PRs that can't be accepted "as is"
    • discuss openly, don't work in the dark
  • changing:
    • do not break compatibility accidentally, without a good reason or without warning
    • borg is not backwards-compatible with attic
    • major versions may not be compatible with older releases

About the last point: i would like to put forward a proposal that will make borg backups compatible from major version X to X+1.

That is, we limit on-disk changes between major releases: those changes should live in a feature branch for a while, then be merged in a development branch, which eventually becomes the X+1 version. The X+1 version can read (and if necessary, convert) backups made with the X version. Then everyone upgrades to the new version and the X+2 version can drop compatibility shims.

So in other words, version X+1 can read and convert backups from version X, but not write them. X cannot read or write backups made with version > X. X+2 cannot read, write or convert backups from version X.

I would personally prefer that the format would be always future proof and you'd be able to restore really old backups without problems. It can be pretty difficult to extract older software on newer platform ("oooh, this was written for Python 2.1, how cute!"), so I would strongly advocate towards keeping backwards compatibility forever. However, I know how hard this can be, so I am ready to concede this can be broken at times. This should be considered an extreme case, and only used when really necessary and we should bundle multiple changes into one to avoid doing that too often.

I would therefore also suggest using semantic versionning for the version numbers, that is version numbers would be X.Y.Z where X is the major number described above, and Y.Z are the regular release numbers used most of the time.

In that way, borg would be attic 2.0.0-alpha.1 (and we simply skipped borg 1.0). note that this would give us the freedom to break compatibility until the golden "2.0.0" release while we put out alphas.

level323 commented 9 years ago

I agree that it would be great if borg could remain backwards-compatible indefinitely but also agree that this is unlikely to be too onerous on the developers. On the flipside, maintaining forever-backwards-compatible code may also limit the future features/performance of the tool so there are benefits to occasional format changes. Also, in this modern computing world we have the wonderful assistance of virtual machines. If we come across an old format repo, we can normally spin up an old distro in a VM and build the appropriate version of attic/borg to access it/convert it to a newer format.

anarcat commented 9 years ago

That assumes that:

  1. users know about the wonderful world of virtual machines and how to operate them
  2. you will find a virtual machine image of that old distro of yours
  3. you will actually find the old copy of the borg code and it will build in that distro (ie. that you haven't picked the wrong magic combination)
  4. that all the old dependencies will still be available in the old distro
  5. that you can import your dataset in the virtual machine (ie. you will probably have to copy the files within the VM, so double the disk space)

That seems like quite a challenge, to say the least.

What I am asking above is not "forever compatible". What I am asking is:

  1. as little changes as possible
  2. when changes are done, bundle them in a major release
  3. support transitioning between two major releases

This is what i would describe as "almost forever". I mean at some point the code will stabilize here, i don't see why the whole thing would need to be rewritten every other year...

rbu commented 9 years ago

I appreciate your effort in the fork, and discussing these topics beforehand in the open.

I am concerned, though, about who would even consider using a backup / data storage tool that cannot guarantee files can be read back later. In my opinion, if you consider changing binary formats, two things need to happen:

  1. A version identifier is increased, so that newer versions of the tool can tell me they are not able to write to this format anymore. In addition, previous versions of the tool can bail out reading or writing files created with a newer version.
  2. Reading and converting old versions must remain possible for several major versions. I am thinking RHEL support cycle long (10 years).

This may sound ridiculously long, but ask yourself: What is the oldest file you created that is still on a storage medium accessible to you. What if you could not read it anymore?

level323 commented 9 years ago

@anarcat - our views are closer than you seem to think. I largely agree with your ideas, particularly that borg should provide backward compatibility for a long time, but not forever (so to speak).

I was attempting (poorly, it seems) to note that VM's can be leveraged to access repos that may have inadvertently become incompatible with the version of borg that the user has installed/immediately available. This ability to use VM's is an opportunity that wasn't available even 10 years ago, so it's no wonder the likes of tar was so intent on retaining backwards compatibility. However, because of VM's (and chroots and similar) this provides alternative routes to reviving old code and accessing old data that didn't exist in the early days of tar and it's ilk and so the urgency to retain long term backward compatibility is reduced (although we may differ on the size of the reduction).

Your description of the trials/tribulations/pitfalls that might beset someone trying to generate a working older version of borg using a VM is, IMO, a tad pessimistic. With regard to your points on the subject:

  1. Granted, at least to some extent. I think a good chunk of linux and mac users (the main target of borg presently) have encountered or used VM's before.
  2. Maybe. Perhaps I live in a blessed world, but Debian and Ubuntu (which I mostly use) retain installer and Live ISO's of there older releases, which you can use to create a VM, in their archives. Debian also recently rolled out http://snapshot.debian.org/
  3. I guess you're right, if github dies. I'd be more concerned about the zombie apocalypse that would come shortly thereafter, however ;-)
  4. Yes, but I'm kind of hoping that the borg team extends jborg's initial efforts (with attic) to create standalone executables of borg releases. This is really the way to go, particularly to make the tool accessible to less geeky types
  5. Although importing a dataset into a VM is one way of doing things, there are at least two better ways that most people would prefer - mounting the repo over a network mount (SMB, fuse, SFTP... whatever) or USB passthrough of a physical device. Neither is particularly difficult.

At the end of the day, if the maintainers were to write shims to convert between adjacent repo format changes, this has the side effect of providing a code architecture to allow someone sufficiently interested to write shims that bridge across multiple repo format changes. That goes "far enough", IMO, as far as responsibility of the core developers/maintainers. The core architecture is in place for someone to write shims to convert/access old repo formats back a potentially unlimited number of versions. If there is enough interest/demand in providing shims going back more than one format change, then the code will eventually get written. I don't want to burden the core developers with having to provide that code - I'd rather those energies went into improving the tool. But I'm just one voice.

Personally I have a high degree of faith that @ThomasWaldmann holds dear the principle that borg should take great care to minimise the number of repo format changes. If the core team has as much coding schmarts as I think they do, we will probably look back on this thread in 10 years as a lot of hot air over nothing as there very likely will be very few repo format changes necessary due to good coding from the outset.

pdf commented 9 years ago

The following should all be read as recommendations, related to future releases rather than the initial development cycle, and IMO.

Obviously providing backwards-compatibility forever can become burdensome, and wedge the project into unmaintainability, however being able to access data from a backup utility is a primary function, and that may include data created using an earlier version of the utility. With that in mind, my recommendations:

ThomasWaldmann commented 9 years ago

ok, today I re-read all your posts from #1 and this issue.

as it currently looks like that I am the only developer who is active in the team, we have to be realistic about available developer resources and that FOSS development is mostly driven by the developers' personal interests.

as you may have noticed, I am not very much interested in writing and maintaining backwards-compat-forever/for-long code, but rather in improving the codebase so it works faster, more scalable, more reliable.

so while I share most of your views about the perfect backup software's properties, borg likely won't be perfect unless more developers show up and help specifically with these goals.

that said, you can always use the release of the software to access the backups that were made with it. we are lucky borg is made with python 3 which is not going away anytime in the forseeable future (unlike python 2.7, which has a somehow limited lifetime when considering really long time frames).

if you can't live with that, you either have to invest your time into the development of these properties or wait until it has stabilized enough for your use case. the currently targeted use case is daily backup of your stuff (copy to backup repo), not long-term archiving (like "move").

the comments about versioning of the software / of the file formats are very reasonable and they are already implemented (AFAICS).

about breaking changes: I added a tag "breaking" for this issue tracker to tag some issues that might require breaking changes. issue #21 addresses the magics/pathes.

so, my plan to proceed is as follows:

we need something NOW that works better and is developed faster than attic. I guess I'll make 0.x.x releases from current master branch (which has only the conservative changes). Maybe we could have some code that patches the magics in a attic repo, so it can be transferred to borg (any volunteers?).

i'll continue developing new stuff in feature branches and merge the wild stuff into the experimental branch and the conservative stuff into master branch. Some day, there will be a 1.0.0 release with breaking changes for good reasons. Whether it'll have some backwards compat mode or converter has to be seen.

rbu commented 9 years ago

That sounds reasonable. Thank you for considering the input.

FYI, Python 2.x is slated to EOL in 2020, which is not that far away considering the turn around times of Linux distributions like Debian and RedHat.

anarcat commented 9 years ago

we need something NOW that works better and is developed faster than attic. I guess I'll make 0.x.x releases from current master branch (which has only the conservative changes). Maybe we could have some code that patches the magics in a attic repo, so it can be transferred to borg (any volunteers?).

That's #21, right?

i'll continue developing new stuff in feature branches and merge the wild stuff into the experimental branch and the conservative stuff into master branch. Some day, there will be a 1.0.0 release with breaking changes for good reasons. Whether it'll have some backwards compat mode or converter has to be seen.

I think that's fair enough. My main concern is to avoid setting "we're going to break stuff" as an explicit policy. I understand the constraints of being the single dev on a project, so I absolutely respect that. :)

I'll try to find if i have time to work on the backwards compat policy :)

ThomasWaldmann commented 9 years ago

yes, #21 - volunteers welcome.

tgharold commented 9 years ago

From a sysadmin point-of-view, I would be okay with borg breaking compatibility - as long as there is a way to read repositories from a few years ago and a way to pick which version of borg I'm running.

Possible approach:

That way I can write my backup scripts against "borg023" (v0.23) and not worry that a future pip install of "borgbackup" is going to break my nightly backup scripts.

The main change which would be immediate is that when a new (major) version of borg is released, there needs to be a new executable filename created of "borg####" where the "####" is the "API version number" or the "last time borg broke compatibility". For someone who doesn't worry about compatibility, they could just use "borg" as the command. Those of us with longer time-frames would be able to use a more-specific "borg####" command out-of-the-box in our scripts to keep running a specific version of borg for a long time, while still being able to get bug/security fixes.

Maybe the build script just copies "borg" to "borg####".

The wrinkle in this is that you will need to make sure that multiple versions of borg can coexist on the same box.

ThomasWaldmann commented 9 years ago

@tgharold that's what virtualenv is for: run arbitrary versions of same library/tool without getting into conflicts. You'ld just have separate virtualenvs per borg version you need.

RonnyPfannschmidt commented 9 years ago

a model comparable to django is thinkable (with long term support versions)

also each backup should probably tell the exact borg version it was made with, so exact tracking is possible

a further idea is pushing the format to a meta level where the crypthers/structural de/encoding is defined, and the implementations are just minimal fully tested implementations each (allowing to keep old backends around

the scope of that needs a different kind of discussion tho

perguth commented 9 years ago

@RonnyPfannschmidt: a model comparable to django is thinkable (with long term support versions)

We should not forget that they have a foundation financing that.

RonnyPfannschmidt commented 9 years ago

yes, until its feasible i suppose a more limited model is necessary

anarcat commented 8 years ago

i think we should resolve this issue for 1.0, in that we should clearly document what garantees we provide for compatibility across release and what the release numbering and support strategy is.

512 shows that, at the very least, that policy could be clarified. there is also an explicit request there of showing deprecation notices for RPC communication changes instead of just dropping them in a point release, which seems like a fair request (although it doesn't apply to the 0.x branch).

so i think there are a few proposals here, if i can summarize, which affect various areas of borg:

For every component of those, we need to make one of those choices:

  1. be compatible forever: either never change the API, or provide backwards compatiblity shim
  2. be compatible between major version X.Y.Z and version X+1.Y.Z, with deprecation warning
  3. break compatibility between X.Y.Z and X+1.Y.Z (with deprecation warning introduced in a X.Y.0 release)
  4. break compatibility between X.Y.Z and X.Y+1.Z (and 3)
  5. break compatibility between X.Y.Z+1 and X.Y.Z+1 (and 3 and 4)
  6. break compatibility whenever we need to

Right now, we are at compatibility level 4 for everything, which is fine because we clearly say we are not API-stable yet, but we say we will be for 1.0.

What I would propose this should be for 1.0 and future stable branches should be:

Any other proposals? Note that the above is not an empty proposal: as a Debian developer, I am used to maintaining old software and would volunteer in maintaining various 1.Y branches.

The above also implicitly proposes to switch to a semantic versionning standard for release, something which could be debated separately, but makes sense (and it seems we are somewhat using it already, i just didn't want to assume it was the case). I believe the above proposal are meaningful regardless of whether we use semver, in any case.

@ThomasWaldmann could we put this back on the 1.0 milestone please?

ThomasWaldmann commented 8 years ago

I will release 1.0 soon and the 1 in 1.0 is not related to giving any guarantees, but because we are doing some (relatively minor) incompatible changes.

anarcat commented 8 years ago

could you clarify your position regarding the above summary? do you object to any sort of compatibility framework? how about semantic versioning?

i have added "level 6" for the "we break compatibility whenever we need to" option, which i didn't believe was an option anymore, but here we go.

anarcat commented 8 years ago

furthermore, i don't see why we need to designate the next release as 1.0 if we are already making incompatible change in the 0.x branch... shouldn't we wait until this is resolved first?

skorokithakis commented 8 years ago

This is related to #512, it would be great if borg could keep backwards compatibility for X releases/Y months and issue deprecation warnings so people would have notice before they upgrade. Putting things in changelogs isn't enough, because nobody reads the changelog of "apt-get upgrade".

My proposal is that backwards-compatible code should stay in for three releases or six months, whichever is longer, and deprecation warnings like "please upgrade your server to 0.XX soon, otherwise it will stop working when client 0.XY is released."

tgharold commented 8 years ago

For creation of new backups, warning the sysadmin that they need to upgrade the client or server to a newer release version is fine. Keep in mind that this tends to be software that gets installed once on a client / server pair, and then rarely upgraded unless there are bugs.

For reading of older backups, borg really needs to be able to read and restore anything repo created in the last few years. That's really the minimum bar for any software that considers itself to belong in the "backup" category. Especially true once you slap a 1.0 label on it. After a disaster, I may only have a drive with the repository on it and not the original software and might have to read it using the latest version of borg.

If I have to go rummaging for a specific version of borg, and a specific version of python and get it all setup in a virtual environment, that's a pretty annoying set of requirements on what is already turning out to be a bad day (on the day that a system crashed and I'm restoring from backups).

I don't care if I can't create or update repos in an older format, but I need to be able to read / verify older repos.

gybr commented 8 years ago

Hi, I'm a new user of borg, and congratulations for this awesome software.

As a user, the only thing that worries me is the message in the doc about deprecation. I think the purpose of a backup software is defeated if compatibility with older archives is broken.

What's your current position about backward compatibility ? I would love to read that it is guaranteed forever, even if it needs to run the "upgrade" command. Borg would become perfect for me if I was sure that the backups I do today will be readable by the latest Borg 5 years from now.

Please consider that backups are not always monitored regularly by a sysadmin, some people make them and keep them for several years, expecting them to be easily recoverable if needed. In a lot of cases, the "compatible only with X+1" policy would create problems.

Also, is there any reason yet to change the format in the future ?

ThomasWaldmann commented 8 years ago

@gybr there are still quite some reasons why we might need to break compatibility, just click on that "breaking" tag you see in the right sidebar (there are different levels of breaking btw.).

We are using semantic versioning, thus such breaks are delayed until next major release.

Developers aren't enjoying that much working on "borg upgrade" yet - it is neither easy nor exciting nor paid (and even if paid a little [see that "remote borg upgrade" bounty], it doesn't help much). Sometimes there can be also technical difficulties / blockers.

What you always can do is to just keep the binaries of major releases, then you maybe don't need to require backwards-compatibility-forever.

gybr commented 8 years ago

OK, I understand. Still hope to see the format stabilize once these five issues are solved. Again, this is the best backup software I've seen. Thank you.

enkore commented 8 years ago

I'm going ahead and say that it pretty much won't happen that a newer Borg version can't read older repos (either directly or via an upgrade path).

Everything(?) discussed so far can be divided in three categories of breaking:

I. Auxiliary files (cache, indices) which can be automatically recreated or are automatically upgraded II. Remote protocol. For these it is unlikely that we do anything for compatibility if a breaking change ist made. I.e. have clients/servers on compatible versions III. Repository metadata layout changes. Depending on the actual change most stuff (reading, extracting, listing) would work both ways, or an upgrade might have to be done. There are also some things debated that would only be backwards incompatible (i.e. use $shiny_new_feature => old version can't handle that).

leoluk commented 6 years ago

IV. Command line options.

I've had my backups broken multiple times now due to backwards incompatible parameters (-e for borg init, parameter ordering changed at some point between 1.0 and 1.1, and a few others in the past I no longer remember)

ThomasWaldmann commented 6 years ago

if you upgrade from 1.0 to 1.1, you have to expect some changes and have to read the changelog.

some of these changes were intended, e.g. -e because it should be an informed decision, since the new blake2b stuff is faster than the old default. also, one does not init that often and usually not automated, so this was not seen as an issue.

some changes came due to a bug in python stdlib's argparse. the bug was there ever, but didn't bite for borg create in 1.0, but did in 1.1 (as paths are optional now). it could also bite you in 1.0 for other commands. i recently updated the docs about cli options and arguments order.

skorokithakis commented 6 years ago

@ThomasWaldmann Not to comment on this specific release, but generally it's less surprising if you deprecate things in one version and remove them in the next. So, 1.1 would print warnings about using something in 1.0 (but it would still work) and it would stop working in 1.2. That does mean you'd have to support both code paths for a release, though.

ThomasWaldmann commented 6 years ago

@skorokithakis see archiver.py for deprecation handling, we already do that in some cases (for options).

the specific problem with -e is that once you've inited a repo and used it for some backups, you can't easily change it later - you'ld have to start a new repo. if we had kept the "repokey[-hmac-sha256]" default as in 1.0, a lot of people would create slower-than-it-needs-to-be repos, by default and not notice that there is repokey-blake2 now, which is faster.

skorokithakis commented 6 years ago

Ah, I see. You are correct.

ThomasWaldmann commented 5 years ago

Guess this can be closed.