borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.27k stars 747 forks source link

Please allow "borg create" while repo/archive is mounted via "borg mount" #768

Closed snhrdt closed 2 months ago

snhrdt commented 8 years ago

Using borg mount to access an archive or even the whole repository is incredibly powerful and a very nice feature.

Unfortunately, this also locks the repository so that no new archives can be created while an archive from the same repository (or even the repository itself) is mounted.

My understanding is that a mounted archive/repository should be read-only anyways, so I do not see a good reason why a simultaneous borg create should not be allowed. I could even live with new archives added after the mount no being visible.

Background: If you run borg create quite often (like every 15 minutes) and your backed-up sources are quite large, the "restore" window between the end of one borg create and the beginning of the next one shrinks or mere minutes. Inspection of archives, non-standard or long-running restores become unfeasible.

Yes, you could store the repository on btrfs or zfs, create a snapshot of the filesystem containing the repository, mount that btrfs/zfs snapshot on /mnt/repository and then run borg mount off of that - but this seems a bit over the top, plus there are those of us who still use ext4 or similar, non-snapshotting file systems.

Changing the code so that borg mount does not lock the repository seems like a huge improvement in usability.

ThomasWaldmann commented 8 years ago

It's not just that we have to avoid 2 writers at the same time. If a reader is active while another writer is active, it could be also that it reads inconsistent data. A lot of stuff is append-only, but e.g. the repo manifest is modified "in-place".

infectormp commented 8 years ago

If a reader is active while another writer is active, it could be also that it reads inconsistent data.

if we mount archive in progress? if yes, maybe possible do not allow mount archive that in progress?

ThomasWaldmann commented 8 years ago

See also #420.

snhrdt commented 8 years ago

I understand the complexities involved; nevertheless, would it not be a good idea to implement file system-like locking for archives? If an archive is being written to - do not allow it to be mounted. All other archives are fair game.

As an alternative: What do you consider as best practice for the use case I detailed in the original post (near-line backups, need to inspect/extract an archive which takes longer than the interval between backup runs)?

ThomasWaldmann commented 8 years ago

@nprncbl it's not like archives are separate, single files for borg. besides that, we just got rid of posix-locking at other places because it caused too many compatibility issues.

If 15 minutes is a too short interval for what you'ld like to do, just use a longer interval?

enkore commented 8 years ago

The main problem I see is that there are two very different approaches to this and both aren't really "neat". I do think this is a valid use case and should be on the list for 1.1 ...or later.

  1. Use transactions to isolate different processes. Problem: segment compaction destroys old transactions. So we'd need a way around that, e.g. putting "opened" transactions in the roster?
    • This would be similar to how (R)DBMS handle this.
  2. Use granular locking. Needs extra RPC calls on the Repository layer, and might add additional timeout issues. Should be tameable by minimizing accesses, e.g. a reader should lock the manifest exclusively-reading, fetch that, and unlock it immediately. An "appender" like create should lock the manifest during the entire operation for writing[1], and exclusively for doing a write.
    • Writers like prune/delete/check etc. would always be repository-exclusive as they are now.

[1] To lock out other "appenders" from adding the same archive concurrently. Or we might add a "under construction" entry to the manifest or something like that.

enkore commented 7 years ago

How about:

borg mount sets append_only flag. umount unsets it. Perhaps add a append_only pin/lock? After the Repository transaction is opened it doesn't care whether new segments are added or new indices written. The cache is not used by mount iirc.

The same approach could be used for extract as well.

This would pretty much be a very simple many-reader single-writer form of MVCC with snapshot isolation.

RonnyPfannschmidt commented 7 years ago

except for needing a semaphore style counter for the recursive uses - each new user needs to increment, else different users may kill each other with race conditions on state reset

enkore commented 7 years ago

Yes, there will be also some other finer details to consider w.r.t. compatibility and safety of intermingled versions etc. [1] -- but I think the approach is workable and would make for a tangible improvement, as they say.

[1] Shouldn't be too hard; the a_o lock mustn't be in the repo config though.

RonnyPfannschmidt commented 7 years ago

i fear this one will demonstrate seriously problematic with the way borg currently is structured

Vayu commented 7 years ago

Mounting append-only repo without locking should definitely be possible (not sure how manifest changes happen in append-only, but they could be made atomic).

For non-append repos, one could have an option to ensure consistency by locking as currently implemented (this can be default setting if you like). But also have non-locking version which will not guarantee consistency.

One could think of the non-locking mode in the same way as of a network mount (some implementations). We could start reading a file and then generation gets deleted, the client will return an IO error and mount will try to refresh metadata.

textshell commented 7 years ago

I think to speak in database terms, borg extract and mount would work with a read committed mode mostly fine. Of course it would fail if the archive that is accessed is deleted while in use. But i think that is a reasonable trade off, maybe enabled with a `--no-lock'. That would also avoid mysterious stale locks that are invisible apart from the repo not beeing compacted.

The basic idea of a read committed mode would be to catch the "file does not exist" error from LoggedIO.get in repository.get and retry with a reloaded index.

Manifest is not going to be a problem, because borg only reads it once as far as i remember. Assuming the mounted / extracted repo is not deleted all chunks are still referenced and so are not going to be compacted away. So the only real things to consider are: archive is deleted:

chunks moved:

I think a read committed without protection against concurrent archive deletes should be a fairly local change (more eager index writing and changes in Repository.get and LoggedIO.get only)

enkore commented 7 years ago

Yes, self-synchronizing read-committed would be simpler and less complex.

I wrote an implementation of what I described above, but it adds quite some complexity to the repo opening and also needed RPC changes (any approach likely does, though), so I don't think it's the correct choice for now.

Mathiasdm commented 7 years ago

@enkore do you have this code available somewhere?

I would like to make it possible to run 'borg create' and 'borg extract' at the same time. I've been looking into the borg code for that, but it's not easy to get acquainted with.

I've done an attempt to get (2) from your comment https://github.com/borgbackup/borg/issues/768#issuecomment-205387270 to work, but I didn't have a lot of success yet.

Any pointers or suggestions are welcome.

marcpope commented 6 years ago

I'd be willing to put up a bounty on this task. From reading the comments above, this is how I see it:

Is it possible to have two lock types? (create / modify)

  1. borg create command sets "create" lock
  2. borg prune, delete, upgrade, etc sets "modify" lock
  3. borg mount, extract, list don't work with modify lock but do with create lock only on completed backups, not partial backups in progress
Mathiasdm commented 6 years ago

Just FYI, I took a different approach for now, which may not be usable for others: I setup 2 backup directories. When uploading, I upload to one of these directories and then rsync to the other. During the upload, all backup requests go to the other backup directory.

marcpope commented 6 years ago

That would require double the space. I thought about doing a weird trick. Using a utility called linux hot copy. https://www.r1soft.com/free-tool-linux-hot-copy it allows you to make a snapshot on the fly. I'd rather not rely on another piece of software though.

aiso-net commented 6 years ago

I need this feature as well, to be able to do a borg extract while a borg create is running, and be able to do a borg create while a borg extract is running, since the extract is read only and would be reading an archive that the create is not going to be writing to.

Nebucatnetzer commented 5 years ago

This function would be useful for GUIs as well. I first implemented the mount function so that the user can mount multiple archives before I ran into this problem. I'm not quite sure yet how I will implement a workaround so that it is easy for a user to understand.

marcpope commented 5 years ago

In the Server GUI I am building this is roughly how I handle it right now (may change as I come up with better solutions):

  1. Each user (client) has it’s own user account on the server so there is separation between clients
  2. ssh key for that user is un-commented
  3. Backup command runs and is forked to background
  4. As soon as backup starts, the ssh user is re-commented out. Since the user is already logged in, it will still run.
  5. After backup, the client notifies the server to run the next steps
  6. During backup, the server monitors the client for disconnects or stalled processes
  7. The server indexes the latest backup’s file structure to a database for faster recovery/searching/selecting files.
  8. The server then runs any “prune after backup” commands locally, so the client is free of any more duties than necessary
  9. A list command then compares any deleted backups with the database and deletes the old indexes.

Most of these changes are a result of feedback and testing I’ve received. I am open to improving the process.

Marc

From: Andreas Zweili notifications@github.com Subject: Re: [borgbackup/borg] Please allow "borg create" while repo/archive is mounted via "borg mount" (#768)

This function would be useful for GUIs as well. I first implemented the mount function so that the user can mount multiple archives before I ran into this problem. I'm not quite sure yet how I will implement a workaround so that it is easy for a user to understand.


This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

Nebucatnetzer commented 5 years ago

My solution for now is that I show the user a dialog informing him that he needs to unmount all the archives before continuing with creating an archive. If he clicks yes I unmount all archives. Probably not the most elegant solution but it works for the moment.

ilippert commented 2 years ago

Just to check - did the discussion here lead anywhere? I see the issue is open, but conversation stopped close to 3 y ago...

ThomasWaldmann commented 2 years ago

It's still the case that borg does not allow multiple parallel operations in same repo, because first op locks the repo.

xeruf commented 2 years ago

Yeah just stumbled upon this as I would like to invoke list while running create. This really should not be forbidden.

marcpope commented 2 years ago

The way I handle it is make a database after each backup of the files then I can search the database independently of the backups. Not ideal but only way to handle it.

On Jan 20, 2022, at 9:45 AM, Janek @.***> wrote:

 Yeah just stumbled upon this as I would like to invoke list while running create. This really should not be forbidden.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.

ThomasWaldmann commented 2 years ago

Guess some people would simply keep borg create --list logs and then do grep -i whatiwant borg*.log.

stevenmunro commented 1 year ago

Just saw --bypass-lock in the general options https://borgbackup.readthedocs.io/en/stable/usage/general.html?highlight=bypass-lock#common-options