Closed snhrdt closed 2 months ago
It's not just that we have to avoid 2 writers at the same time. If a reader is active while another writer is active, it could be also that it reads inconsistent data. A lot of stuff is append-only, but e.g. the repo manifest is modified "in-place".
If a reader is active while another writer is active, it could be also that it reads inconsistent data.
if we mount archive in progress? if yes, maybe possible do not allow mount archive that in progress?
See also #420.
I understand the complexities involved; nevertheless, would it not be a good idea to implement file system-like locking for archives? If an archive is being written to - do not allow it to be mounted. All other archives are fair game.
As an alternative: What do you consider as best practice for the use case I detailed in the original post (near-line backups, need to inspect/extract an archive which takes longer than the interval between backup runs)?
@nprncbl it's not like archives are separate, single files for borg. besides that, we just got rid of posix-locking at other places because it caused too many compatibility issues.
If 15 minutes is a too short interval for what you'ld like to do, just use a longer interval?
The main problem I see is that there are two very different approaches to this and both aren't really "neat". I do think this is a valid use case and should be on the list for 1.1 ...or later.
[1] To lock out other "appenders" from adding the same archive concurrently. Or we might add a "under construction" entry to the manifest or something like that.
How about:
borg mount sets append_only flag. umount unsets it. Perhaps add a append_only pin/lock? After the Repository transaction is opened it doesn't care whether new segments are added or new indices written. The cache is not used by mount iirc.
The same approach could be used for extract as well.
This would pretty much be a very simple many-reader single-writer form of MVCC with snapshot isolation.
except for needing a semaphore style counter for the recursive uses - each new user needs to increment, else different users may kill each other with race conditions on state reset
Yes, there will be also some other finer details to consider w.r.t. compatibility and safety of intermingled versions etc. [1] -- but I think the approach is workable and would make for a tangible improvement, as they say.
[1] Shouldn't be too hard; the a_o lock mustn't be in the repo config though.
i fear this one will demonstrate seriously problematic with the way borg currently is structured
Mounting append-only repo without locking should definitely be possible (not sure how manifest changes happen in append-only, but they could be made atomic).
For non-append repos, one could have an option to ensure consistency by locking as currently implemented (this can be default setting if you like). But also have non-locking version which will not guarantee consistency.
One could think of the non-locking mode in the same way as of a network mount (some implementations). We could start reading a file and then generation gets deleted, the client will return an IO error and mount will try to refresh metadata.
I think to speak in database terms, borg extract and mount would work with a read committed mode mostly fine. Of course it would fail if the archive that is accessed is deleted while in use. But i think that is a reasonable trade off, maybe enabled with a `--no-lock'. That would also avoid mysterious stale locks that are invisible apart from the repo not beeing compacted.
The basic idea of a read committed mode would be to catch the "file does not exist" error from LoggedIO.get in repository.get and retry with a reloaded index.
Manifest is not going to be a problem, because borg only reads it once as far as i remember. Assuming the mounted / extracted repo is not deleted all chunks are still referenced and so are not going to be compacted away. So the only real things to consider are: archive is deleted:
chunks moved:
I think a read committed without protection against concurrent archive deletes should be a fairly local change (more eager index writing and changes in Repository.get and LoggedIO.get only)
Yes, self-synchronizing read-committed would be simpler and less complex.
I wrote an implementation of what I described above, but it adds quite some complexity to the repo opening and also needed RPC changes (any approach likely does, though), so I don't think it's the correct choice for now.
@enkore do you have this code available somewhere?
I would like to make it possible to run 'borg create' and 'borg extract' at the same time. I've been looking into the borg code for that, but it's not easy to get acquainted with.
I've done an attempt to get (2) from your comment https://github.com/borgbackup/borg/issues/768#issuecomment-205387270 to work, but I didn't have a lot of success yet.
Any pointers or suggestions are welcome.
I'd be willing to put up a bounty on this task. From reading the comments above, this is how I see it:
Is it possible to have two lock types? (create / modify)
Just FYI, I took a different approach for now, which may not be usable for others: I setup 2 backup directories. When uploading, I upload to one of these directories and then rsync to the other. During the upload, all backup requests go to the other backup directory.
That would require double the space. I thought about doing a weird trick. Using a utility called linux hot copy. https://www.r1soft.com/free-tool-linux-hot-copy it allows you to make a snapshot on the fly. I'd rather not rely on another piece of software though.
I need this feature as well, to be able to do a borg extract while a borg create is running, and be able to do a borg create while a borg extract is running, since the extract is read only and would be reading an archive that the create is not going to be writing to.
This function would be useful for GUIs as well. I first implemented the mount function so that the user can mount multiple archives before I ran into this problem. I'm not quite sure yet how I will implement a workaround so that it is easy for a user to understand.
In the Server GUI I am building this is roughly how I handle it right now (may change as I come up with better solutions):
Most of these changes are a result of feedback and testing I’ve received. I am open to improving the process.
Marc
From: Andreas Zweili notifications@github.com Subject: Re: [borgbackup/borg] Please allow "borg create" while repo/archive is mounted via "borg mount" (#768)
This function would be useful for GUIs as well. I first implemented the mount function so that the user can mount multiple archives before I ran into this problem. I'm not quite sure yet how I will implement a workaround so that it is easy for a user to understand.
This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
My solution for now is that I show the user a dialog informing him that he needs to unmount all the archives before continuing with creating an archive. If he clicks yes I unmount all archives. Probably not the most elegant solution but it works for the moment.
Just to check - did the discussion here lead anywhere? I see the issue is open, but conversation stopped close to 3 y ago...
It's still the case that borg does not allow multiple parallel operations in same repo, because first op locks the repo.
Yeah just stumbled upon this as I would like to invoke list
while running create
. This really should not be forbidden.
The way I handle it is make a database after each backup of the files then I can search the database independently of the backups. Not ideal but only way to handle it.
On Jan 20, 2022, at 9:45 AM, Janek @.***> wrote:
Yeah just stumbled upon this as I would like to invoke list while running create. This really should not be forbidden.
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.
Guess some people would simply keep borg create --list
logs and then do grep -i whatiwant borg*.log
.
Just saw --bypass-lock in the general options https://borgbackup.readthedocs.io/en/stable/usage/general.html?highlight=bypass-lock#common-options
Using
borg mount
to access an archive or even the whole repository is incredibly powerful and a very nice feature.Unfortunately, this also locks the repository so that no new archives can be created while an archive from the same repository (or even the repository itself) is mounted.
My understanding is that a mounted archive/repository should be read-only anyways, so I do not see a good reason why a simultaneous
borg create
should not be allowed. I could even live with new archives added after the mount no being visible.Background: If you run
borg create
quite often (like every 15 minutes) and your backed-up sources are quite large, the "restore" window between the end of oneborg create
and the beginning of the next one shrinks or mere minutes. Inspection of archives, non-standard or long-running restores become unfeasible.Yes, you could store the repository on btrfs or zfs, create a snapshot of the filesystem containing the repository, mount that btrfs/zfs snapshot on /mnt/repository and then run
borg mount
off of that - but this seems a bit over the top, plus there are those of us who still use ext4 or similar, non-snapshotting file systems.Changing the code so that
borg mount
does not lock the repository seems like a huge improvement in usability.