Open hypnotoad opened 4 years ago
Are you aware that even if you just backed up a new file in 2019, the chunks for restoring it may be saved in the volume from 2011, because of deduplication? So restoring just a single file, regardless of it's creating date, could require access to all your volumes, which would be very impractical from my point of view.
If you had many backup volumes, you could get the "windows 3.0" install feeling. :-)
(the stuff was on many floppy disks back then and it requested random disks while installing, repeatedly)
@fantasya-pbem : Sure, I am aware that all volumes are needed. The volumes are only needed during a restore, so it is not really unpractical.
As main use case, I see "Neverending backups": You simply never delete old backups and start a new volume when a disk is full. That would be practical for almost everyone. If you re-organize your photo files, deduplication kicks in. Of course, you can manually split your backups into something like "all photos" and "all videos", but then you need to do 2 backups with 2 meda every time you do a backup.
Let's not discuss that it would be unpractical because of xyz. I would propose that we keep the thread open in case someone needs the feature or wants to implement it. If there was an FAQ entry "are multi-volume backups possible?", I would not have started this thread.
I agree with the OP (@hypnotoad ). I too have encountered such scenarios and agree it would be a nice to have feature. That being said, relying on multiple volumes isn't as safe on relying on just 1. Assuming a volume is just a HDD, you increase the chances of your backup being irrecoverable by adding more HDDs as you increase the chance of a failure in the backup array. I use borg within NAS however, so this risk is mitigated by my NAS's RAID array.
Right now, borg just can't do this kind of job. When restoring, if the segments are incomplete, it will pause and prompt to ask for the missing but needed segment, which is fine. When during backup, if the dest repo lack some of the segments, the create action will just fail (however, sometimes the borg cache could keep the creation going even the real segment file is actually gone, but we just can't count on it). Anyway, multiple volume is not in borg's feature list, for now.
For guys who need to do incremental backups even the old archives are offline/inaccessible, there is a qualified tool called "DAR backup". It's a file-level backup tool instead of block-level. When DAR create archives, it generate a "catalogue" which records all the files and their size, date, CRC so this catalogue can be a reference when creating new archive. The catalogue can be isolated into a standalone file, very small, so easy to be kept in local while the real archive data can be stored anywhere - no need for the whole old archive, just this catalogue file as a reference archive is enough to do a incremntal backup.
Though DAR can not do block-level-dedup, but most of the time file-level-dedup is farily enough. Even borg can't dedup all kinds of files. I used to use borg to bakcup a virtual machine folder, it had vbox's vdi file inside, which is a sparse file, as a virtual disk image, constantly modified. Borg definitely tried to dedup the data in this vdi file, but after several archives had been created, the resulting repo size was still growing linearly, and the status report said there was little data to dedup. You can say that scroll-window-buffer-block-level-dedup is not almighty.
Borg is not suitable for poor/non-random-access storage medium/device. Borg assumes that the disk is big and fast enough. You can say it's born for hard disks while agaist tapes or optical discs. However, when we do backups, I mean real backups, not some temporary or whim bakcups, then we store our precious archives in a long-live medium, like a magnetic tape or an optical disc. In the past, a well-made HDD can live for decades, but that was the good old days. Nowadays, all consumer grade disks only have a 3-5 years warranty, most of them just die not long after that, some of them even die before the day. HDD is already pretty long-term for preservation, compared to Flash chips. But as a complex electromechanical device, if any part inside of it failed, the data may be gone. And to restore data from a broken HDD is rather difficult, or expensive, or just impossible, depending on the situations.
So boys, do not ask borg to do something that it do no want to do.
I think this would be very useful. Especially if you could set up on how many redundant copies in a set of volumes you want to have of your files. This way you could have a set of hdds in a remote location and them sometime take the oldest one home and add new files. It would also be good with a a way to store files where the number of redundant copies is already high enough, just before you are going to move a hdd to the remote location.
Hi I wouldn't think that having borg backup over more than one location would be a major issue... however, I was thinking that instead of swapping between backup media, you'd simply just plug in more than one backup USB hard drive...
Linux is perfectly capable of having more than one USB HDD plugged in - all that happens is that you mount them in two mount points - in which case, there should be no reason why borg cannot extend the backup across both drives - so backup chunks that do not fit in disk 1 are simply written to disk 2.
I've got a 2Tb backup disk, and a number of 500Gb disks - I'd really not want to have to go out and buy a 3 or 4 Tb disk when my 2Tb fills up - its costly and a little frustrating if I've got perfectly good disks sitting by the side of me...
Yes, its perfectly possible to have one repository on each disk, and run two backups one to each disk - and splitting the source information across the two backups - however, that means re-working the backups and losing the historical information, as opposed to a graceful expansion...
Anyway - just my 10c worth - please do not think I'm being critical here - I believe borg is one of the best backup mechanisms out there - its just that I think the ability to extend repos across more than one disk would be a very worthwhile addition..
All the best Carl
@carlbeech you know, you can create RAID over multiple USB drives. Or, more flexibly, add all your USB drives to LVM group and get "one big virtual storage" -- and access it as such after inserting all disks. Or course, if ANY disk fails -- in most cases all your backups are gone :) But take into account how borg deduplication will spread your data chunks all over multiple disks anyway -- so if archiving/fragmenting is implemented poorly (without redundancy) you will lose/corrupt most of your backups even using Borg only.
Therefore consequences of disk failure are almost the same -- for LVM or borg-custom multidisk solution. You may try LVM if increased risk is worth it.
On general topic: of course, if implemented correctly, redundant copies spread over multiple media are great to have. But then Borg will become not only backup solution, but also archiving-library-management solution. You may look how complex it may become by trying to setup and fully understand git-annex(1). It's literally configuration complexity hell, which maybe fits into the brain of only the most prudent and disciplined data-hoarders.
As of today, spreading a backup over several disks is still not possible?
Borg does not handle disks. It just writes a "repository", which is a bunch of some directories and files, to a folder on a file system. This file system has to deliver enough free space. This file system has to be either mounted locally on the system where borg is executed (the client) or is accessed via SSH on a storage server that has installed a Borg binary, too.
@sat-hub Hello fellow grumpy cat! 🤣
I found here via Googling. Here's what I learned. I have my backups as a 2 USB-drive setups in a remote location and I just want to spread the files over two disks easily.
mergerfs is good, same for rclone union, and sth similar.
borg2 (currently in beta) uses borgstore
for the repository storage and there is already some basic implementation of MStore
, which can use multiple Store
instances as backends - either for redundancy or to distribute stuff to multiple places.
MStore
isn't much practically tested yet though and also the implementation needs more work.
This could be a game-changer to be able to backup to a series of older harddisks as cold-backup. Will there be some redundancy for restoring after 1 drive fails like Raid 5 does? I thought about Raid 5, but that means I need all disks attached. Cold backup disks I would attach via USB. There are 5 disk enclosures, but what if the count of disks increases and I only have 1 disk after the other attached.
It would be great if borg can backup to a series of disks, having 1 disk redundancy like RAID-5. Is that generally possible?
I don't think I'll implement rather complicated stuff (like RAID5, ECC codes, ...) in borgstore's `MStore' - if one wants that, one could also add a borgstore backend for something already implementing such features (like e.g. ceph).
But if it is simple enough, like mirroring or distribution, that's better in scope (and partially already there).
When adding more devices to the Store, then the risk to damage will increase a lot without redundancy and would straiten the meaning of a backup. Any layer introducing Raid or distributed devices is always based on having all devices available.
Some background why backups over multiple volumes is highly interesting: None of the deduplicating backup strategies like borg or restic is able to backup into slices for tapes or removable mediums. This is often useful when having small old devices consuming too much energy as raid-cluster but are still good enough for cold-backups. One of the only sophisticated backup projects providing backups to slices is dar: http://dar.linux.free.fr/doc/Features.html But dar cannot cope with all the snapshots that I have on btrfs. AFAIK borg can deduplicate well enough to keep the snapshots deduplicated as it is in btrfs.
I use borg backup for several years now and recommend it to everyone. There is just one thing missing and I am wondering if there is a plan of the borg developer team.
A fundamental limit of borg is that it is located to a single position in your file system and to a single backup medium. So when the backup medium is full, it has to be replaced by a bigger one. Instead, I think it should be possible to just have another volume. E.g., "volume 1" has all the files backuped in 2011-2018 and "volume 2" has 2019-2022 (but none of volume 1).
As far as I see it (reading the documentation several times), it is not possible to achieve this right now. I would assume it should be possible in the following way: When initializing a new volume, the database is copied from the previous volume with additional info that the data is on another volume. The data iteself is not copied.
Are there any plans like that by to borg team?