EnterpriseDB / barman

Barman - Backup and Recovery Manager for PostgreSQL
https://www.pgbarman.org/
GNU General Public License v3.0
2.07k stars 191 forks source link

How to identify which one is full backup and which backup is incremental back? #934

Closed wasiualhasib closed 3 months ago

wasiualhasib commented 3 months ago

Using list-backup command how do you identify which one is full backup or which one is incremental backup? Look at the below list of database backup. Here both backup size looks like 26.5GB but it is actually not. Actual size of Backup ID 20240611T220902 is 26.5GB and for backup ID: 20240611T221117 is 3MB. I think there need a correction.

[barman@DB-1 ~]$ barman list-backup pg
pg 20240611T221117 - Tue Jun 11 22:11:33 2024 - Size: 26.5 GiB - WAL Size: 1.4 GiB
pg 20240611T220902 - Tue Jun 11 22:09:56 2024 - Size: 26.5 GiB - WAL Size: 85.9 KiB

I identified it when entered into that directory path where barman server exists : /data/barman/pg/base/ and executed du -sch in my linux machine, otherwise it is not possible using barman list-backup pg commands

[barman@DB-1 ~]$ cd /data/barman/pg/
base/           errors/         incoming/       streaming/    wals/

[barman@DB-1 ~]$ cd /data/barman/pg/base/
[barman@DB-1 base]$ ls -l 
total 0
drwxr-x--- 3 barman barman 37 Jun 11 22:09 20240611T220902
drwxr-x--- 3 barman barman 37 Jun 11 22:11 20240611T221117

[barman@DB-1 base]$ du -sch *
27G     20240611T220902
3.1M    20240611T221117

I have few issue on this:

  1. How to identify which one is incremental in size and which one is full backup?
  2. At incremental backup there is no option for compression. So if database size is 1T then I have to keep storage at least 2 times than actual size of backup. It means compression and incremental backup not possible both at the same time. To do compression we need to use backup_method=postgres
barthisrael commented 3 months ago

Hello 👋

You are using the backup_method = rsync method with reuse_backup = link, right?

How to identify which one is incremental in size and which one is full backup?

When using rsync method, Barman uses hard links to create file-level incremental backups.

So, you take your first backup, which is "full", then each new backup will copy only the files which have been modified since the previous backup. The files which were not modified have a simple hard-link to the corresponding file from the previous backup. That essentially means the backups of the server will share the files which were not modified between the backups.

The output of barman list-backup command shows the total size of each backup, i.e. considering all the files which are required by the backup to be restored.

If you want to check how much of incremental size a given backup had, then you can use the barman show-backup command.

For example, I have these 2 backups in my server:

$ barman list-backup pg17-rsync
pg17-rsync 20240611T180600 - Tue Jun 11 18:06:51 2024 - Size: 41.7 MiB - WAL Size: 0 B
pg17-rsync 20240610T182149 - Mon Jun 10 18:21:52 2024 - Size: 38.2 MiB - WAL Size: 32.0 MiB

I took an initial backup 20240610T182149, then created a 3MB table in Postgres, and lastly took a new backup 20240611T180600. As you can see, my last backup 20240611T180600 contains the size of the original files from the first backup + the files modified in the meantime.

I can now use barman show-backup command to check the incremental size of each of them:

$ barman show-backup pg17-rsync 20240610T182149
Backup 20240610T182149:
  Server Name            : pg17-rsync
  System Id              : 7377136404149220174
  Status                 : DONE
  PostgreSQL Version     : 170000
  PGDATA directory       : /var/lib/pgsql/17/data

  Base backup information:
    Disk usage           : 22.2 MiB (38.2 MiB with WALs)
    Incremental size     : 22.2 MiB (-0.00%)
    Timeline             : 1
    Begin WAL            : 00000001000000000000001B
    End WAL              : 00000001000000000000001B
    WAL number           : 1
    Begin time           : 2024-06-10 18:21:49.970096+00:00
    End time             : 2024-06-10 18:21:52.601712+00:00
    Copy time            : 1 second
    Estimated throughput : 13.8 MiB/s
    Begin Offset         : 96
    End Offset           : 344
    Begin LSN            : 0/1B000060
    End LSN              : 0/1B000158

  WAL information:
    No of files          : 2
    Disk usage           : 32.0 MiB
    WAL rate             : 0.13/hour
    Last available       : 00000001000000000000001D

  Catalog information:
    Retention Policy     : not enforced
    Previous Backup      : - (this is the oldest base backup)
    Next Backup          : 20240611T180600
$ barman show-backup pg17-rsync 20240611T180600
Backup 20240611T180600:
  Server Name            : pg17-rsync
  System Id              : 7377136404149220174
  Status                 : DONE
  PostgreSQL Version     : 170000
  PGDATA directory       : /var/lib/pgsql/17/data

  Base backup information:
    Disk usage           : 25.7 MiB (41.7 MiB with WALs)
    Incremental size     : 5.1 MiB (-80.33%)
    Timeline             : 1
    Begin WAL            : 00000001000000000000001D
    End WAL              : 00000001000000000000001D
    WAL number           : 1
    Begin time           : 2024-06-11 18:06:00.203819+00:00
    End time             : 2024-06-11 18:06:51.852541+00:00
    Copy time            : less than one second
    Estimated throughput : 5.4 MiB/s
    Begin Offset         : 40
    End Offset           : 400
    Begin LSN            : 0/1D000028
    End LSN              : 0/1D000190

  WAL information:
    No of files          : 0
    Disk usage           : 0 B
    Last available       : 00000001000000000000001D

  Catalog information:
    Retention Policy     : not enforced
    Previous Backup      : 20240610T182149
    Next Backup          : - (this is the latest base backup)

So, my second backup occupies 25.7 MiB of disk if I consider the whole backup, but it introduced 5.1 MiB worth of files.

At incremental backup there is no option for compression. So if database size is 1T then I have to keep storage at least 2 times than actual size of backup. It means compression and incremental backup not possible both at the same time. To do compression we need to use backup_method=postgres

As you noted, backup compression is only supported with backup_method = postgres. In that sense, with the current implementation you need to choose if you prefer incremental backups through rsync method, or compressed backups through postgres method.

wasiualhasib commented 3 months ago

Hi @barthisrael,

Yes, you are right; you explained it clearly. I have used that command to understand the incremental size. In your show-backup command: Incremental size: 5.1 MiB (-80.33%),

what is the meaning of -80.33% and how is it calculated?

As per my assumption, it is calculated like this: (5.1/25.7)-1=(-80.155%).

But it does not match exactly 80.33%, what is the meaning of the negative sign also here?

Another issue is that, if I use streaming protocol instead of barman wal archiving command at archive_command in PostgreSQL , then incremental backup does not work for me. In that case, it takes full backup instead of incremental backup though backup_method is rsync and reuse_backup=link.

Basically for full backup everyday night I call below command barman backup --reuse-backup=off pg

And for incremental backup every 6 hour I call below command. As per configuration it should take incremental backup by default.

barman backup pg

Could you please correct me if I am wrong?

barthisrael commented 3 months ago

what is the meaning of -80.33% and how is it calculated?

As per my assumption, it is calculated like this: (5.1/25.7)-1=(-80.155%).

But it does not match exactly 80.33%, what is the meaning of the negative sign also here?

The value is calculated as you noted above. You can see the actual code here.

Please note that the math is performed using bytes, but the size output is shown in "human-readable" mode, so it's rounded to the nearest unit, in my case MiB.

The meaning of the negative sign is: this backup reduced the disk space usage in "x%" compared to the actual size of the backup if it were to copy all the files. In my case the incremental size was around 20%, so the backup used 80% less disk by using hard-links.

Basically for full backup everyday night I call below command barman backup --reuse-backup=off pg

And for incremental backup every 6 hour I call below command. As per configuration it should take incremental backup by default.

barman backup pg

Could you please correct me if I am wrong?

Please note that you do not need to generate "full" backups with barman backup --reuse-backup=off.

As I mentioned earlier, the incremental backup in Barman is implemented at file-level by using hard links.

Assume you have no backups at all in your system, and you have backup_method = rsync and reuse_backup = link. The first backup that you take will find no base files, so it will copy all of them during the backup.

Later, when you run a new backup again, it will identify all files that were already copied by the previous backup and which have not changed in the meantime, and create a hard-link on them. For the files which changed between the first backup and the second backup, it will create these different files.

Similarly, when you run barman delete to remove a backup, it will remove all the references to the file that it contains. Once a given file has the reference count set to 0, the file is removed from the filesystem.

I suggest you take a look at how hard-links work in Linux. That might help you to understand how Barman leverage that mechanism to provide file-level incremental backups through rsync.

Another issue is that, if I use streaming protocol instead of barman wal archiving command at archive_command in PostgreSQL , then incremental backup does not work for me. In that case, it takes full backup instead of incremental backup though backup_method is rsync and reuse_backup=link.

Do you mean you are not able to use streaming_archiver = on in your configuration file together with backup_method = rsync?

In any case, please note that the base backup and WAL archiving are different things in Barman. You should be able to have backups taken through rsync, with WALs being streamed from Postgres to Barman throughpg_receivewal (streaming_archiver = on).

Could you please clarify why do you think it's not working for you?

martinmarques commented 3 months ago

@wasiualhasib As mentioned in the other issue, I recommend sending these questions to the Barman Google group