borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
10.96k stars 740 forks source link

Original size doubles with two symlinks? #4014

Closed toastie89 closed 5 years ago

toastie89 commented 6 years ago

Hi!

I'm trying to backup the folder structure below. The only data is a 75MB large file of random data in the files folder. In the subfolder linksthere are two symlinks (absolute and relative) to the files folder.

/data
├── files
│   └── 75MB.txt
└── links
    ├── another1 -> ../files/
    └── another2 -> /data/fileserver/

From the documentation I understood that symlinks are not followed. So my expectation is to have an original file size of about 75MB. For some reason, borg shows about 150MB of file size:

borg -V
borg 1.1.6
borg info .
[...]
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
All archives:              157.28 MB            157.90 MB             78.95 MB

                       Unique chunks         Total chunks
Chunk index:                      44                   84

BTW, when having only one symlink a size of about 75MB is reported.

Any idea if this is a bug or just misunderstanding on my end?

ThomasWaldmann commented 6 years ago

Did that backup take more than 30 minutes? If so, it might be the bug related to checkpoints not being accounted correctly.

You can check it with:

borg list repo::archive --consider-part-files
toastie89 commented 6 years ago

Thanks for your reply. Actually it took only some seconds.

I've conducted the case described in the issue especially to reproduce the situation I've seen on larger scale in my backup.

ThomasWaldmann commented 6 years ago

Give me a script containing all commands needed to reproduce your issue. Try to keep the amount of commands / options / files as low as possible. Do not use absolute paths except if needed to demonstrate the issue.

toastie89 commented 6 years ago

Hi!

Below the commands to reproduce the issue.

After some trying I realized that I possibly misunderstood the information shown by borg info and there is actually no issue with the symlinks.

Each time I run borg create /test/backup::{now} /test/data the "Original size" show by borg info increases by 75MB even when the files don't change. Is this the correct behaviour?

docker run -it --rm -v `pwd`/tmp:/test b3vis/borgmatic sh
mkdir -p /test/data/files /test/data/links /test/backup
dd if=/dev/urandom of=/test/data/files/75MB bs=1 count=75MB
cd /test/data/links/
ln -s /test/data/files  another1
ln -s ../data/files     another2
ls -lah
du -h /test/data                 # we have about 71,5MB in /test/data
borg init -e none /test/backup
borg create /test/backup::{now} /test/data
borg create /test/backup::{now} /test/data
borg info /test/backup
# borg info shows now about 150 MB as "Original size"
rm -R ~/test
exit

Thanks in advance for your help!

ThomasWaldmann commented 6 years ago

Can it be reproduced without docker and without absolute pathes?

toastie89 commented 6 years ago

Hi! Yes. Just tried in an vanilla Ubuntu 18.04 live environment and the numbers multiply with each backup, even without symlinks:

borg --version                   # borg 1.1.5
mkdir -p /test/data/files /test/backup
dd if=/dev/urandom of=/test/data/files/75MB bs=1 count=75MB
du -h /test/data                 # we have about 72MB in /test/data
borg init -e none /test/backup
borg create /test/backup::{now} /test/data
borg create /test/backup::{now} /test/data
borg info /test/backup
# borg info shows now about 150 MB as "Original size"
ThomasWaldmann commented 6 years ago

This does not look like a reproduction script for the initial issue.

Of course does the original size multiply (75, 150, 225, ... for 1st, 2nd, 3rd backup) because you did a full backup of 75MB per backup run. The dedup size column is the number you want to look at for this case.

But in the first issue you described something else.

You are still using absolute paths (/test/...) that are not existent on a normal system nor does one have write permissions to that place without root. If you write a script for reproduction, that's a bad idea, just stay in the current directory, e.g. $HOME/testing and do all stuff with relative paths (like repo).