borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
10.9k stars 738 forks source link

Control Borg process from another program? #7531

Open Daniel-Nashed opened 1 year ago

Daniel-Nashed commented 1 year ago

Hi,

my use case is that I have an application, which puts databases into backup mode one by one, but all the database files should be stored in one borg create operation.

This wasn't possible earlier, but with --paths-from-stdin I almost have what I need. What is missing in my approach is a trigger/status when the file is backup-ed and I can get the database file back into normal operations.

One idea was to use the --list option. But my tests show, that the file is listed already when Borg starts taking the backup. I would need a trigger, after the file backup was completed.

My idea would be to hook the input and output of the borg command and control it that way.

If there is an easier way, to get the same use case covered, I am very open for ideas.

From my application view a kind of request model would be best -- like a small API to talk to a running borg operation.

But I think the --paths-from-stdin should be already a good way, if I get any kind of feedback, that the file is done.

Maybe there is an easier way? Maybe having a log option after the file is backed up would be useful also for other requirements?

In some cases it might be useful to know what is being processed. in other cases (like my case) it would be helpful to see what has been already processed.

Thanks for any tip!

ThomasWaldmann commented 1 year ago

So you dump one database to a file, then you feed the filename to borg's stdin, then you'ld need a "finished" signal from borg, then you'ld delete the db dump file from the fs (to save space) and repeat for remaining databases?

So basically, the feedback from borg is only needed to save space?

ThomasWaldmann commented 1 year ago

BTW, borg < 1.2 used to log the file name after backing it up.

But that had issues:

Daniel-Nashed commented 1 year ago

We are not dumping the database. we are putting it into backup mode. it's is a HCL Notes/Domino database. I am the responsible developer at HCL working on the integration and have a borg integration I want to improve.

This might move to the official HCL community repo as soon I find a solution.

HCL Domino 12 Backup is a new approach to backup allowing customers to build integrations. The Borg backup case was my first show case how this could look like. And I wasn't happy with having a separate borg create for each database.

Domino is not one big database like MySQL. For example each user has an own mail database, which is presented by a separate file on disk.

I agree to the dumping use case and in this case I would prefer to have separate files for each database. But in this case this is about one server backup with many databases.

It could be a separate log statement after the backup. Or it could be a separate log file I can monitor. I would love to have a kind of simple API .. But that's more a day dream I guess.

What we implemented for Windows in the release shipping this month is full VSS Writer integration with application consistent snapshots. And I am a bit unhappy that we have no comparable solution on Linux.

So I am trying to improve the available Linux options. Our integration also works on Docker. We are adding Borg backup to the container and have full backup and restore functionality with Borg backup.

In the HCL Domino community image we have a borg backup build option that gets all the scripts and borgbackup installed and configured..

ThomasWaldmann commented 1 year ago

Ah, ok, that made it clearer.

I have a slightly hackish solution that has the advantage of not needing a change in borg:

put `<DB>` in backup mode
ts = now()
write ts into a file named `<DB>.timestamp`
send paths of `<DB>` and `<DB>.timestamp` to `borg create --list --paths-from-stdin`
wait until you see `<DB>.timestamp` in the output
put `<DB>` back in normal mode
delete file named `<DB>.timestamp`
repeat with next db

You could also use any other "signalling" filename or put some other useful metadata into that file.

A zero-byte signal file would also be nice considering it does not create any content-chunk data.

Daniel-Nashed commented 1 year ago

Thanks! This would be a first step to see this functionality working today. Maybe this use case could be in future covered in a more elegant way?

I will try the zero byte file. The whole idea with the input and output is more a work-around, but a good first step. This might needs me to write a new program, communicating with the two components. I have done something before like this for Veeam integration. But this isn't my preference.

Cool would be a mini REST interface on the loopback port or socket to talk to a running Borg backup ;-)

But thanks for this first idea and to confirm I did not miss any other option we might have.

My other idea was a two layer approach.

  1. Copy everything to ZFS and take snapshots there
  2. After the backup completed take this snapshot and do a remote backup via Borg

But I really want a to offer an end to end Borg only integration!

Daniel-Nashed commented 10 months ago

Just revisited my request after I discovered a new feature in BorgBackup.

borg import-tar --ignore-zeros

allows to pipe data into the running backup operation.

Wrote a prototype in C which does the following:

With popen() start this type of command to write to it:

borg import-tar --ignore-zeros 'backup-archiv' -

Then use a trigger file which contains the file name to backup and read the data via popen() read from a command like this:

tar -cPf - 'myfile' 

The data I am reading from one file handle, is written directly into the borg write file handle. The data has the tar format information for the files. I first thought I would need to write code to write in tar format. But tar does that for me and just sends me the data stream.

Once done I remove the trigger file and let me calling program continue with the next file.

I could also find a better communication way, by having the same binary use a two way pipe communication to let it wait until the data was streamed to BorgBackup via tar format.

each file will be in a separate tar file and this will work with the new option ´--ignore-zeros`.

Does this sound like a plan? Or is there an easier way someone could think of?

The C program is already working. I might want to change the communication. The key here is to wait until the file is processed.

-- Daniel

ThomasWaldmann commented 10 months ago

@Daniel-Nashed sounds like a plan. And your code then can notice when tar finishes, then also borg is done with that file and you can continue to the next one).

Daniel-Nashed commented 10 months ago

Thanks for your quick feedback! I have a first version. The same program also generates the request and waits for the trigger file containing the file name is removed.

I need to do some more testing. But the prototype works great so far. This will be part of one of my open source projects and improve the performance of my backup.

I thought about not using the tar binary and write the format on my own. but I think this approach is more flexible.