EnterpriseDB / barman

Barman - Backup and Recovery Manager for PostgreSQL
https://www.pgbarman.org/
GNU General Public License v3.0
2.03k stars 191 forks source link

Incremental backups #21

Closed secwall closed 3 years ago

secwall commented 8 years ago

Hello. I would like to discuss page-level incremental backups. I’ve created proof-of-concept fork of barman here There is no docs and unit-tests right now, but this will be fixed in near future.

Motivation: We have large number of databases with pgdata size about 3 terabytes and changes about 1% of data per 24h. Unfortunately barman backups with hardlinks gives us about 45% deduplication ratio (there are small changes in many data-files, so many data-files changes between backups, but page changed ratio is about 2%)

Solution to this problem seems simple: take only changed pages to backup. I’ve created simple script named barman-incr (it is in bin dir of source code). It handles backup and restore operations. Barman runs it on database host and passes LSN, timestamp and list of files from previous backup. Then we just open each datafile and read every page in it (if it turns out that file we opened is not datafile, we’ll take it all). If page is lsn >= provided lsn we take this page to backup.

Some tests: Database with pgdata size 2.7T, 120G wals per 24h. Full backup size is 537G (compressed with gzip -3), time to take backup - 7h. Incremental backup size is 14G (also compressed with gzip -3), time to take backup - 30m.

I’ve also tested restore consistency (restored database to some point of time and compared pg_dump result with paused replica).

Block change tracking (Oracle DBAs should be familiar with this, here is white paper about this) implementation will require some changes in wal archiving process. I’ll present some thoughts and test results on this in Q1 2016.

man-brain commented 8 years ago

Any thoughts, guys?

secwall commented 8 years ago

Hmm. It seems that there is no discussion. Let's move to exact questions: 1) Is issue with many datafiles change with not so many pages change quite common (e.g. do we need page-level incremental backups in barman)? 2) Running script over ssh on postgresql database host could be not so good idea. May be there are other ways of making page-level incremental backups possible? 3) If current approach is ok, what should be fixed in my fork before merging? (code style in barman-incr, unit-tests and docs, anything else?)

gbartolini commented 8 years ago

Hi,

first thanks for your contribution. We are currently 100% focused on Barman 1.6.0 with streaming replication support. Hence we apologise for not responding any earlier.

As far as this is concerned, our ultimate goal is to have this feature in PostgreSQL's core (pg_basebackup), rather than having it part of Barman - you can see our previous attempts at this in the hackers list of PostgreSQL.

However, having said this, we were discussing over lunch about your patch just yesterday and one idea that came up could be to add a function in pgespresso that returns the content a requested block in a file (or a list of blocks). This would avoid installing an agent on the Postgres server.

Please bear with us, we will do our best to evaluate your code but it won't be any time soon.

Thanks, Gabriele

Gabriele Bartolini - 2ndQuadrant Italia - Managing Director PostgreSQL Training, Services and Support gabriele.bartolini@2ndQuadrant.it | www.2ndQuadrant.it

2016-01-13 9:58 GMT+01:00 secwall notifications@github.com:

Hmm. It seems that there is no discussion. Let's move to exact questions: 1) Is issue with many datafiles change with not so many pages change quite common (e.g. do we need page-level incremental backups in barman)? 2) Running script over ssh on postgresql database host could be not so good idea. May be there are other ways of making page-level incremental backups possible? 3) If current approach is ok, what should be fixed in my fork before merging? (code style in barman-incr, unit-tests and docs, anything else?)

— Reply to this email directly or view it on GitHub https://github.com/2ndquadrant-it/barman/issues/21#issuecomment-171219639 .

man-brain commented 8 years ago

We are currently 100% focused on Barman 1.6.0 with streaming replication support. Hence we apologise for not responding any earlier.

No problem, guys. Although we are doing lots of rebasing :) you are doing right work, thanks!

As far as this is concerned, our ultimate goal is to have this feature in PostgreSQL's core (pg_basebackup), rather than having it part of Barman - you can see our previous attempts at this in the hackers list of PostgreSQL.

Yep, we've seen that but it seems that you have given up on it after you didn't have time to push it into 9.5. Having it in core PostgreSQL would be really great but our change brings not only increments, it also brings parallelism and compression. These two changes are really important for quite big databases. Rsync or pg_basebackup support compression but right now you can hit either network bandwidth (no compression) or speed of one CPU core (with compression). We are launching several processes to have an ability to utilize all resources and do it with maximum efficiency and flexibility.

... one idea that came up could be to add a function in pgespresso that returns the content a requested block in a file (or a list of blocks). This would avoid installing an agent on the Postgres server.

Yes, we do really want to avoid the need of installing something else on database servers, but implementing such a thing in pgespresso (or other extension with use of libpq) may be not a good decision. It would be quite difficult (but possible) to save parallelism and it would make restore much complicated. Actually, most of restore logic (decompression and merging increments) would be done on backup server neither on database host which seems a bit odd.

secwall commented 8 years ago

Hello, guys. I see 1.6.0 release. So could we continue our discussion? As @dev1ant mentioned moving logic into pgespresso will make recover more complex. Also db hosts have more CPU power and faster disks in our environment, so it's better to perform heavy operations on them (recover operation with barman-incr on db host in our tests is about 3 times faster than on barman host). And it seems to be quite common case?

man-brain commented 8 years ago

Any chance you will take a look at it, guys?

secwall commented 8 years ago

We started using fork with incremental backups on production. Here are some numbers: Our typical database looks like this (so pgdata is about 5 TiB):

root@xdb2011g ~ # df -h | grep pgsql
/dev/md4         14T  5.0T  8.1T  39% /var/lib/pgsql/9.4/data
/dev/md3        189G   82G   98G  46% /var/lib/pgsql/9.4/data/pg_xlog

It's backup looks like this (we use gzip -3 for backup compression and gzip -6 for WAL compression):

root@pg-backup05i ~ # barman list-backup xdb2011
xdb2011 20160330T020103 - Wed Mar 30 03:53:47 2016 - Size: 51.0 GiB - WAL Size: 60.8 GiB
xdb2011 20160329T020103 - Tue Mar 29 03:51:44 2016 - Size: 50.3 GiB - WAL Size: 114.8 GiB
xdb2011 20160328T020103 - Mon Mar 28 03:45:12 2016 - Size: 52.3 GiB - WAL Size: 112.8 GiB
xdb2011 20160327T020103 - Sun Mar 27 09:50:25 2016 - Size: 1.0 TiB - WAL Size: 88.7 GiB
xdb2011 20160326T020102 - Sat Mar 26 04:52:37 2016 - Size: 58.4 GiB - WAL Size: 122.1 GiB
xdb2011 20160325T020102 - Fri Mar 25 03:42:46 2016 - Size: 58.9 GiB - WAL Size: 122.6 GiB
xdb2011 20160324T020103 - Thu Mar 24 03:38:19 2016 - Size: 39.0 GiB - WAL Size: 126.5 GiB
xdb2011 20160323T020103 - Wed Mar 23 04:39:37 2016 - Size: 33.5 GiB - WAL Size: 82.2 GiB
xdb2011 20160322T020103 - Tue Mar 22 04:51:06 2016 - Size: 33.0 GiB - WAL Size: 76.1 GiB - OBSOLETE*
xdb2011 20160321T020103 - Mon Mar 21 04:20:11 2016 - Size: 28.2 GiB - WAL Size: 74.2 GiB - OBSOLETE*
xdb2011 20160320T020106 - Sun Mar 20 09:22:48 2016 - Size: 971.3 GiB - WAL Size: 48.4 GiB - OBSOLETE*

We start backups at 02:00, so full backup takes about 7-8 hours, and incremental backup takes about 3 hours (we could get speed up here by using block change tracking, but it is not ready yet). Backups + WALs for recovery window of 1 week consumes about 3.3 TB for this database.

gbartolini commented 8 years ago

Hi guys,

I have to apologise again, but as you might have noticed, adding streaming replication support has taken longer than just 1.6.0! We have just released 1.6.1 and are working on 1.6.2/1.7.0 which will hopefully bring full pg_basebackup support and streaming-only backup solutions (suitable for PostgreSQL on Docker and Windows environments too).

Your patch is definitely very interesting but until we have completed support for streaming only backups, we have to postpone the review and the integration (mainly for testing purposes).

However, I thank you again for your interest and your efforts.

Ciao, Gabriele

gbartolini commented 8 years ago

While looking at your patch, I have been thinking about two possible ideas:

  1. Do you think you can isolate the lzma patch so that we can include that separately in Barman's core?
  2. I'd suggest having the remote 'barman-incr' as a separate script - it could even be a more generic barman-agent script that will be executed on the Postgres server via Ssh

Thanks again, Gabriele

secwall commented 8 years ago

Hello.

  1. May be we could just import lzma only if lzma compression was requested by user (and return error if module is unavailable), is this approach ok? (lzma is currently used only in barman-incr, I didn't change WAL compression part).
  2. It seems that I don't understand this part. barman-incr is actually in separate package (https://github.com/secwall/barman/blob/master/rpm/barman.spec#L49-55) and it is really executed on postgresql server via ssh (https://github.com/secwall/barman/blob/master/barman/backup_executor.py#L774-783)
man-brain commented 7 years ago

Any success here, guys? Very soon it would be a one year open PR...

gbartolini commented 7 years ago

In this period we have been extremely busy releasing version 2.0, with all the new features you are aware of. In order to include this patch in Barman we drafted a plan with secwall that included several code contributions, some of which have already been implemented, such as:

The difficulty in this patch is, as we have said in the past, to integrate it with any existing use case of Barman, without breaking back compatibility. Also, our approach is to reach the goal through an incremental process.

The next step will be to add parallel backup (v3?) - which should be quite straightforward now with the CopyController infrastructure, and then integrate the work of secwall on a remote PostgreSQL server, with an agent (for this reason too we have created the barman-cli package).

I hope that with this message you can clearly see our commitment and our efforts to get to this goal. Of course, having a stakeholder willing to fund the development of such a feature will raise the priority of this feature and allow us to develop that in a shorter timeframe.

kamikaze commented 7 years ago

This project is going to die with such speed and priorities, don't waste your time guys. Fork it

secwall commented 7 years ago

@kamikaze @soshnikov, could you kindly stop blaming? @gbartolini explained why incremental backups are not merged yet. We want this feature in mainline because we lack resources to support our own fork.

RealLord commented 7 years ago

Hm.... It's extremely strange, that the real feature that can provide more backup performance than all other Postgress backup software is still in working progress, not in production.

FractalizeR commented 7 years ago

Yandex guys said here, that Barman authors ask for money to merge this feature.

Уже почти год прошел, как мы их просим запилить эту киллер-фичу, а они просят с нас денег, чтобы замержить её.

Nearly a year has passed since we are asking [Barman team] to merge this killer-feature. And they are asking us for money to merge it.

Translation into English is mine.

2017-02-17 12 25 15

Can someone elaborate what the problem is? Where did this money question came from? I think this is just misunderstanding, right?

man-brain commented 7 years ago

@FractalizeR, yes, this is just misunderstanding. The reason is that article was written (in written form) from my verbal words on the conference and after that edited by copywriter. My point was that here @gbartolini wrote:

Of course, having a stakeholder willing to fund the development of such a feature will raise the priority of this feature and allow us to develop that in a shorter timeframe.

As you can see, these two sentences are completely different.

FractalizeR commented 7 years ago

Yep, sure, I can see that now. Sorry for late reply.

s200999900 commented 7 years ago

Hi!

Sorry for my easy english )

It is very helpful function!

I suggest looking at "borg backup" project as a storage backend. https://github.com/borgbackup/borg It has many good backup functionality: encryption, compression, deduplication, ssh as transport...

There are no python api for now, but it is possible to run as wrapper script for create, restore, check, list backup operation's.

I can help with testing in that way. But I need some help with right instruction to do that.

AntonBushmelev commented 7 years ago

Hello guys, any news about implementing this killer feature ?

man-brain commented 6 years ago

I suppose that this issue should be closed since nothing has been done for 2,5 years. We have merged all these features to wal-g upstream and we wouldn't support our barman fork any more.

kamikaze commented 6 years ago

I suppose this project should be closed since nothing has been done for 2.5 years

amenonsen commented 3 years ago

It's a pity this feature was not merged, especially because the patch (even today) looks really nicely done.

That said, with the benefit of several years of hindsight, scanning page header to detect updates based on LSN is a lot faster than rsync, but still too expensive for very large data directories. We know there are extensions like ptrack which take a more proactive approach to recording changes, and that seems like the right approach going forward.

Meanwhile, this project is now under active maintenance again. But I'll close this issue now because there's no point leaving it open. I do hope to support incremental backups, but we (still!) hope that core Postgres will eventually provide a feature that Barman can use to do so.

harisai-m commented 2 days ago

Finally, we have incremental backups feature on core Postgres 17