Closed anarcat closed 5 years ago
I agree that it would be great if borg could remain backwards-compatible indefinitely but also agree that this is unlikely to be too onerous on the developers. On the flipside, maintaining forever-backwards-compatible code may also limit the future features/performance of the tool so there are benefits to occasional format changes. Also, in this modern computing world we have the wonderful assistance of virtual machines. If we come across an old format repo, we can normally spin up an old distro in a VM and build the appropriate version of attic/borg to access it/convert it to a newer format.
That assumes that:
That seems like quite a challenge, to say the least.
What I am asking above is not "forever compatible". What I am asking is:
This is what i would describe as "almost forever". I mean at some point the code will stabilize here, i don't see why the whole thing would need to be rewritten every other year...
I appreciate your effort in the fork, and discussing these topics beforehand in the open.
I am concerned, though, about who would even consider using a backup / data storage tool that cannot guarantee files can be read back later. In my opinion, if you consider changing binary formats, two things need to happen:
This may sound ridiculously long, but ask yourself: What is the oldest file you created that is still on a storage medium accessible to you. What if you could not read it anymore?
@anarcat - our views are closer than you seem to think. I largely agree with your ideas, particularly that borg should provide backward compatibility for a long time, but not forever (so to speak).
I was attempting (poorly, it seems) to note that VM's can be leveraged to access repos that may have inadvertently become incompatible with the version of borg that the user has installed/immediately available. This ability to use VM's is an opportunity that wasn't available even 10 years ago, so it's no wonder the likes of tar
was so intent on retaining backwards compatibility. However, because of VM's (and chroots and similar) this provides alternative routes to reviving old code and accessing old data that didn't exist in the early days of tar
and it's ilk and so the urgency to retain long term backward compatibility is reduced (although we may differ on the size of the reduction).
Your description of the trials/tribulations/pitfalls that might beset someone trying to generate a working older version of borg using a VM is, IMO, a tad pessimistic. With regard to your points on the subject:
At the end of the day, if the maintainers were to write shims to convert between adjacent repo format changes, this has the side effect of providing a code architecture to allow someone sufficiently interested to write shims that bridge across multiple repo format changes. That goes "far enough", IMO, as far as responsibility of the core developers/maintainers. The core architecture is in place for someone to write shims to convert/access old repo formats back a potentially unlimited number of versions. If there is enough interest/demand in providing shims going back more than one format change, then the code will eventually get written. I don't want to burden the core developers with having to provide that code - I'd rather those energies went into improving the tool. But I'm just one voice.
Personally I have a high degree of faith that @ThomasWaldmann holds dear the principle that borg should take great care to minimise the number of repo format changes. If the core team has as much coding schmarts as I think they do, we will probably look back on this thread in 10 years as a lot of hot air over nothing as there very likely will be very few repo format changes necessary due to good coding from the outset.
The following should all be read as recommendations, related to future releases rather than the initial development cycle, and IMO.
Obviously providing backwards-compatibility forever can become burdensome, and wedge the project into unmaintainability, however being able to access data from a backup utility is a primary function, and that may include data created using an earlier version of the utility. With that in mind, my recommendations:
ok, today I re-read all your posts from #1 and this issue.
as it currently looks like that I am the only developer who is active in the team, we have to be realistic about available developer resources and that FOSS development is mostly driven by the developers' personal interests.
as you may have noticed, I am not very much interested in writing and maintaining backwards-compat-forever/for-long code, but rather in improving the codebase so it works faster, more scalable, more reliable.
so while I share most of your views about the perfect backup software's properties, borg likely won't be perfect unless more developers show up and help specifically with these goals.
that said, you can always use the release of the software to access the backups that were made with it. we are lucky borg is made with python 3 which is not going away anytime in the forseeable future (unlike python 2.7, which has a somehow limited lifetime when considering really long time frames).
if you can't live with that, you either have to invest your time into the development of these properties or wait until it has stabilized enough for your use case. the currently targeted use case is daily backup of your stuff (copy to backup repo), not long-term archiving (like "move").
the comments about versioning of the software / of the file formats are very reasonable and they are already implemented (AFAICS).
about breaking changes: I added a tag "breaking" for this issue tracker to tag some issues that might require breaking changes. issue #21 addresses the magics/pathes.
so, my plan to proceed is as follows:
we need something NOW that works better and is developed faster than attic. I guess I'll make 0.x.x releases from current master branch (which has only the conservative changes). Maybe we could have some code that patches the magics in a attic repo, so it can be transferred to borg (any volunteers?).
i'll continue developing new stuff in feature branches and merge the wild stuff into the experimental branch and the conservative stuff into master branch. Some day, there will be a 1.0.0 release with breaking changes for good reasons. Whether it'll have some backwards compat mode or converter has to be seen.
That sounds reasonable. Thank you for considering the input.
FYI, Python 2.x is slated to EOL in 2020, which is not that far away considering the turn around times of Linux distributions like Debian and RedHat.
we need something NOW that works better and is developed faster than attic. I guess I'll make 0.x.x releases from current master branch (which has only the conservative changes). Maybe we could have some code that patches the magics in a attic repo, so it can be transferred to borg (any volunteers?).
That's #21, right?
i'll continue developing new stuff in feature branches and merge the wild stuff into the experimental branch and the conservative stuff into master branch. Some day, there will be a 1.0.0 release with breaking changes for good reasons. Whether it'll have some backwards compat mode or converter has to be seen.
I think that's fair enough. My main concern is to avoid setting "we're going to break stuff" as an explicit policy. I understand the constraints of being the single dev on a project, so I absolutely respect that. :)
I'll try to find if i have time to work on the backwards compat policy :)
yes, #21 - volunteers welcome.
From a sysadmin point-of-view, I would be okay with borg breaking compatibility - as long as there is a way to read repositories from a few years ago and a way to pick which version of borg I'm running.
Possible approach:
That way I can write my backup scripts against "borg023" (v0.23) and not worry that a future pip install of "borgbackup" is going to break my nightly backup scripts.
The main change which would be immediate is that when a new (major) version of borg is released, there needs to be a new executable filename created of "borg####" where the "####" is the "API version number" or the "last time borg broke compatibility". For someone who doesn't worry about compatibility, they could just use "borg" as the command. Those of us with longer time-frames would be able to use a more-specific "borg####" command out-of-the-box in our scripts to keep running a specific version of borg for a long time, while still being able to get bug/security fixes.
Maybe the build script just copies "borg" to "borg####".
The wrinkle in this is that you will need to make sure that multiple versions of borg can coexist on the same box.
@tgharold that's what virtualenv is for: run arbitrary versions of same library/tool without getting into conflicts. You'ld just have separate virtualenvs per borg version you need.
a model comparable to django is thinkable (with long term support versions)
also each backup should probably tell the exact borg version it was made with, so exact tracking is possible
a further idea is pushing the format to a meta level where the crypthers/structural de/encoding is defined, and the implementations are just minimal fully tested implementations each (allowing to keep old backends around
the scope of that needs a different kind of discussion tho
@RonnyPfannschmidt: a model comparable to django is thinkable (with long term support versions)
We should not forget that they have a foundation financing that.
yes, until its feasible i suppose a more limited model is necessary
i think we should resolve this issue for 1.0, in that we should clearly document what garantees we provide for compatibility across release and what the release numbering and support strategy is.
so i think there are a few proposals here, if i can summarize, which affect various areas of borg:
For every component of those, we need to make one of those choices:
Right now, we are at compatibility level 4 for everything, which is fine because we clearly say we are not API-stable yet, but we say we will be for 1.0.
What I would propose this should be for 1.0 and future stable branches should be:
Any other proposals? Note that the above is not an empty proposal: as a Debian developer, I am used to maintaining old software and would volunteer in maintaining various 1.Y branches.
The above also implicitly proposes to switch to a semantic versionning standard for release, something which could be debated separately, but makes sense (and it seems we are somewhat using it already, i just didn't want to assume it was the case). I believe the above proposal are meaningful regardless of whether we use semver, in any case.
@ThomasWaldmann could we put this back on the 1.0 milestone please?
I will release 1.0 soon and the 1 in 1.0 is not related to giving any guarantees, but because we are doing some (relatively minor) incompatible changes.
could you clarify your position regarding the above summary? do you object to any sort of compatibility framework? how about semantic versioning?
i have added "level 6" for the "we break compatibility whenever we need to" option, which i didn't believe was an option anymore, but here we go.
furthermore, i don't see why we need to designate the next release as 1.0 if we are already making incompatible change in the 0.x branch... shouldn't we wait until this is resolved first?
This is related to #512, it would be great if borg could keep backwards compatibility for X releases/Y months and issue deprecation warnings so people would have notice before they upgrade. Putting things in changelogs isn't enough, because nobody reads the changelog of "apt-get upgrade".
My proposal is that backwards-compatible code should stay in for three releases or six months, whichever is longer, and deprecation warnings like "please upgrade your server to 0.XX soon, otherwise it will stop working when client 0.XY is released."
For creation of new backups, warning the sysadmin that they need to upgrade the client or server to a newer release version is fine. Keep in mind that this tends to be software that gets installed once on a client / server pair, and then rarely upgraded unless there are bugs.
For reading of older backups, borg really needs to be able to read and restore anything repo created in the last few years. That's really the minimum bar for any software that considers itself to belong in the "backup" category. Especially true once you slap a 1.0 label on it. After a disaster, I may only have a drive with the repository on it and not the original software and might have to read it using the latest version of borg.
If I have to go rummaging for a specific version of borg, and a specific version of python and get it all setup in a virtual environment, that's a pretty annoying set of requirements on what is already turning out to be a bad day (on the day that a system crashed and I'm restoring from backups).
I don't care if I can't create or update repos in an older format, but I need to be able to read / verify older repos.
Hi, I'm a new user of borg, and congratulations for this awesome software.
As a user, the only thing that worries me is the message in the doc about deprecation. I think the purpose of a backup software is defeated if compatibility with older archives is broken.
What's your current position about backward compatibility ? I would love to read that it is guaranteed forever, even if it needs to run the "upgrade" command. Borg would become perfect for me if I was sure that the backups I do today will be readable by the latest Borg 5 years from now.
Please consider that backups are not always monitored regularly by a sysadmin, some people make them and keep them for several years, expecting them to be easily recoverable if needed. In a lot of cases, the "compatible only with X+1" policy would create problems.
Also, is there any reason yet to change the format in the future ?
@gybr there are still quite some reasons why we might need to break compatibility, just click on that "breaking" tag you see in the right sidebar (there are different levels of breaking btw.).
We are using semantic versioning, thus such breaks are delayed until next major release.
Developers aren't enjoying that much working on "borg upgrade" yet - it is neither easy nor exciting nor paid (and even if paid a little [see that "remote borg upgrade" bounty], it doesn't help much). Sometimes there can be also technical difficulties / blockers.
What you always can do is to just keep the binaries of major releases, then you maybe don't need to require backwards-compatibility-forever.
OK, I understand. Still hope to see the format stabilize once these five issues are solved. Again, this is the best backup software I've seen. Thank you.
I'm going ahead and say that it pretty much won't happen that a newer Borg version can't read older repos (either directly or via an upgrade path).
Everything(?) discussed so far can be divided in three categories of breaking:
I. Auxiliary files (cache, indices) which can be automatically recreated or are automatically upgraded II. Remote protocol. For these it is unlikely that we do anything for compatibility if a breaking change ist made. I.e. have clients/servers on compatible versions III. Repository metadata layout changes. Depending on the actual change most stuff (reading, extracting, listing) would work both ways, or an upgrade might have to be done. There are also some things debated that would only be backwards incompatible (i.e. use $shiny_new_feature => old version can't handle that).
IV. Command line options.
I've had my backups broken multiple times now due to backwards incompatible parameters (-e
for borg init
, parameter ordering changed at some point between 1.0 and 1.1, and a few others in the past I no longer remember)
if you upgrade from 1.0 to 1.1, you have to expect some changes and have to read the changelog.
some of these changes were intended, e.g. -e because it should be an informed decision, since the new blake2b stuff is faster than the old default. also, one does not init that often and usually not automated, so this was not seen as an issue.
some changes came due to a bug in python stdlib's argparse. the bug was there ever, but didn't bite for borg create in 1.0, but did in 1.1 (as paths are optional now). it could also bite you in 1.0 for other commands. i recently updated the docs about cli options and arguments order.
@ThomasWaldmann Not to comment on this specific release, but generally it's less surprising if you deprecate things in one version and remove them in the next. So, 1.1 would print warnings about using something in 1.0 (but it would still work) and it would stop working in 1.2. That does mean you'd have to support both code paths for a release, though.
@skorokithakis see archiver.py for deprecation handling, we already do that in some cases (for options).
the specific problem with -e is that once you've inited a repo and used it for some backups, you can't easily change it later - you'ld have to start a new repo. if we had kept the "repokey[-hmac-sha256]" default as in 1.0, a lot of people would create slower-than-it-needs-to-be repos, by default and not notice that there is repokey-blake2 now, which is faster.
Ah, I see. You are correct.
Guess this can be closed.
one of the main contentious points of #1 is whether borg should be backwards-compatible with attic, or how/if backwards-compatibility should be broken within borg itself.
so to clarify this, i wish to open a discussion specifically about this topic. this is basically a continuation of https://github.com/jborg/attic/issues/215 and #1.
the original proposal from @ThomasWaldmann was:
I #25, i make an entry in the documentation (the FAQ) that summarizes the points from #1 as:
About the last point: i would like to put forward a proposal that will make borg backups compatible from major version X to X+1.
That is, we limit on-disk changes between major releases: those changes should live in a feature branch for a while, then be merged in a development branch, which eventually becomes the X+1 version. The X+1 version can read (and if necessary, convert) backups made with the X version. Then everyone upgrades to the new version and the X+2 version can drop compatibility shims.
So in other words, version X+1 can read and convert backups from version X, but not write them. X cannot read or write backups made with version > X. X+2 cannot read, write or convert backups from version X.
I would personally prefer that the format would be always future proof and you'd be able to restore really old backups without problems. It can be pretty difficult to extract older software on newer platform ("oooh, this was written for Python 2.1, how cute!"), so I would strongly advocate towards keeping backwards compatibility forever. However, I know how hard this can be, so I am ready to concede this can be broken at times. This should be considered an extreme case, and only used when really necessary and we should bundle multiple changes into one to avoid doing that too often.
I would therefore also suggest using semantic versionning for the version numbers, that is version numbers would be X.Y.Z where X is the major number described above, and Y.Z are the regular release numbers used most of the time.
In that way, borg would be attic 2.0.0-alpha.1 (and we simply skipped borg 1.0). note that this would give us the freedom to break compatibility until the golden "2.0.0" release while we put out alphas.