borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
10.93k stars 739 forks source link

ACL code does a lot of user/group<->uid/gid lookups #5295

Open biji opened 4 years ago

biji commented 4 years ago

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Issue

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

client: borg 1.1.13

server: borg 1.1.13

Operating system (distribution) and version.

client: Centos 6

server: Centos 7

Hardware / network configuration, and filesystems used.

Backup via internet

How much data is handled by borg?

450GB

302366 files

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg create -s -p -C lzma --list --filter=AME --exclude-from=/root/exclude-backup.txt --files-cache=ctime,size borg@server-ip:backup/pdc::pdc-{now} /home/data/

Describe the problem you're observing.

I experience slow 2nd+ backup with winbind as nss provider after a after successful 1st backup.

First backup takes 23 hours, next backups still takes >20 hours

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

I have tried to remove /root/.cache , but still slow

Thanks

Include any warning/errors/backtraces from the system logs

none

ThomasWaldmann commented 4 years ago

Easiest explanation is that something frequently/regularly changes the ctime of many fs items, e.g. chmod/chown -R .... Or, that the full path is changing (does not look like, you wrote that you use /home/data/).

Of course, after removing borg's cache, next backup is expected to be slow, but the one after that should then be faster again.

Another reason why you maybe do not see a huge speedup could be that you have lots of small files, which are in general less efficient to deal with. In that case 20h < 23h is maybe all the speedup you'll get.

biji commented 4 years ago

Yes, there are lots of small files: roaming storage for windows users. Any suggestion for speeding up backup lots of small files? I'll try again using mtime,size . Ok.. using fresh cache (will do backup twice)

Thanks

biji commented 4 years ago

Hi, i've tried 3 backups with clear borg's cache, using files-cache=ctime,size again Excluding some files, now files around 130.000

Time required for first backup is: 23 hours , next backups is 17 hours , and new files is not many

Is there any way to make next backups faster than 17 hours? Probably xattr causes slow down while enumerating files? I checked the mount point is using xattr option for samba Or maybe python dictionary lookup is that slow

Archive name: pdc-2020-08-13
Archive fingerprint: 9c72ae92713cf9313f85393f57d49edd7dd78156007666fca081c2b7161eee54
Comment:
Hostname: pdc
Username: root
Time (start): Thu, 2020-08-13 21:03:48
Time (end): Fri, 2020-08-14 20:20:07
Duration: 23 hours 16 minutes 18.47 seconds
Number of files: 131817
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              236.07 GB            173.87 GB             21.28 MB
All archives:                4.54 TB              3.83 TB            176.91 GB

                       Unique chunks         Total chunks
Chunk index:                  334972              3362062
Archive name: pdc-2020-08-14
Archive fingerprint: c04886c76b7f0468884eee5e014c65d914e16ae747a3103129ebcb97c45c6243
Comment:
Hostname: pdc
Username: root
Time (start): Fri, 2020-08-14 21:18:05
Time (end): Sat, 2020-08-15 15:07:00
Duration: 17 hours 48 minutes 54.85 seconds
Number of files: 132209
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              237.16 GB            174.94 GB             20.16 MB
All archives:                4.54 TB              3.83 TB            176.91 GB

                       Unique chunks         Total chunks
Chunk index:                  334972              3362062
Archive name: pdc-2020-08-15
Archive fingerprint: 6ca991b167276e2f76b17b2d1284172faae0cee9bf5de6c7b419b30add9ff2eb
Comment:
Hostname: pdc
Username: root
Time (start): Sat, 2020-08-15 23:24:51
Time (end): Sun, 2020-08-16 16:40:31
Duration: 17 hours 15 minutes 39.20 seconds
Number of files: 132369
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              237.31 GB            175.09 GB            128.19 MB
All archives:                4.54 TB              3.83 TB            176.91 GB

                       Unique chunks         Total chunks
Chunk index:                  334972              3362062
ThomasWaldmann commented 4 years ago

Something is "wrong" in your setup.

Here are some stats from one of my backups:

Archive name: server-2020-08-15-23:42:01
Archive fingerprint: ...
Time (start): Sat, 2020-08-15 23:42:11
Time (end):   Sat, 2020-08-15 23:45:23
Duration: 3 minutes 11.94 seconds
Number of files: 802035
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              669.61 GB            619.04 GB              4.72 MB
All archives:               86.02 TB             78.02 TB              1.14 TB

                       Unique chunks         Total chunks
Chunk index:                 7919874            241722469

So you see, for this backup, it took 3 minutes to process 800k files (most of them did not have changes, only a few new/with changes resulting in a deduped backup size of ~5MB).

Setup:

borg client:

borg repo server:

ThomasWaldmann commented 4 years ago

Stuff to check:

ThomasWaldmann commented 4 years ago

You did not specify where you got borg 1.1.13 from. Is it the binary from github releases?

If yes, maybe also try an older binary on the centos6 machine.

borg 1.1.11 was built on (old) debian wheezy and might work better for old linux dists. See #5220. borg 1.1.13 was built on (more recent) debian jessie.

This only applies to the "fat binaries" offered on github releases (made with pyinstaller).

If you installed borg from source (including via pip), there is no reason to use an older borg release.

biji commented 4 years ago

Wow thats very fast, 3 minutes.. Yes i think so something wrong with my setup 😭

Borg is compiled using pip, python is 3.6.10 , also compiled from source

Client is virtual machine, with 4 GB allocated. Load is below 1. Using iotop i can see no heavy read/write

I will check again.. Thanks

biji commented 4 years ago

Hi.. I think I found the problem

When I stop samba while backup, borg skips many files very fast, when I start samba again, backup slow again . My system is using winbind as nss provider to resolve file owner

Using borg diff I found many lines like this:

[DOMAIN\myuser:users -> None:users]

If ext4 acl is already backup by borg, I think it is safe

Then I try backup again with --numeric-owner and samba started, but backup is slow not like when i stop samba. I hope backup is fast when using --numeric-owner , without stopping samba

Thanks

ThomasWaldmann commented 4 years ago

Interesting. So, which of --files-cache=ctime,size,inode does this trigger?

biji commented 4 years ago

I'm using --files-cache=ctime,size

ThomasWaldmann commented 4 years ago

Hmm, maybe winbind does not produce cache misses (at least not if you always have it active), but it is just winbind being very slow compared to a normal stat() against the local filesystem?

ThomasWaldmann commented 3 years ago

TODO: check if borg create --numeric-ids ... does uid/gid -> name lookups.

detoxhby commented 3 years ago

Can confirm this behavior. It's a long running issue which only appears on system where samba+winbind is used as full nss provider. No file change problem occours, only added/modified/deleted files are processes, and there is no load on the disk (no real read/write happening).

Even using --numeric-ids the id resolution is triggered and extremely slow (16-20 hours for around 800G data from a RAID10/ext4 enterprise array). This could be around 20-30 mins top with active cache.

For comparsion:

Any workaround would be appreciated.

ThomasWaldmann commented 3 years ago

The normal stat processing at backup time does not do any uid/gid to user/group name lookups if --numeric-owner (correct option name - sorry, i gave an incorrect one in my comment above) is specified.

But, no matter whether --numeric-owner is given or not, ACL processing might do user/group name to uid/gid lookups.

One thing is strange though: in borg 1.1.x, there is a LRU cache for the user/group/uid/gid lookup with unlimited capacity. Thus, after every id/name has been looked up once, this should be very fast and not cause significant slowdown.

ThomasWaldmann commented 3 years ago

So, guess we need an strace (with and without winbind nss) here to determine what's slow.

detoxhby commented 3 years ago

I'd provide strace output but I cannot mess with the system as it is running in production, but I made a short debug logging inside winbind and found that it's processing hundreds of getpwuid calls per second.

As this machine is an Active Directory member (server), the uid range is at 65000+ and what I found interesting is that the same uids get requested over and over again. The directory that is getting backed up is a samba share where the uid/gid is mapped from AD so 99% of the files use non-standard local uids.

So far it seems like borg is not using its uid lookup cache or some conditions prevents it to operate properly.

Hope these infos help to corner the issue, I'll try to gain more if possible.

ThomasWaldmann commented 3 years ago

@detoxhby thanks for the infos, I'll do some related tests (but I don't have an AD member to test on).

One idea for you might be to run strace and attach it to a running borg process via pid. You'ld need root and it might slow down things a bit, but other than that there should be no impact on a prod system.

ThomasWaldmann commented 3 years ago

The lru cache works, as expected (tried with borg 1.1-maint branch, which is ~ 1.1.15 and python 3.7.3 - but the lru cache stuff did not change recently):

(borg-env) user@development:~$ borg create repo::arch .
getpwuid(0) -> root [default:None]
getgrgid(0) -> root [default:None]
getpwuid(1000) -> user [default:1000]
# ^ the above are always done, come from arg parser setup
#
# v the below ones are from input file processing:
# everything in my home is owned by 1000:1000, but due to caching
# there is only 1 real call for getting the username and 1 for getting the group name
getpwuid(1000) -> user [default:None]
getgrgid(1000) -> user [default:None]

Update: calling print(uid2user.cache_info()) at the end of the run method shows huge counts of cache hits and little misses.

@detoxhby are the getpwuid calls really coming from the borg process? Do you have posix ACLs in there?

ThomasWaldmann commented 3 years ago

To get this output, I added a print() to some functions in borg/helpers.py, like e.g. this:

@lru_cache(maxsize=None)
def uid2user(uid, default=None):
    try:
        user = pwd.getpwuid(uid).pw_name
        print(f"getpwuid({uid}) -> {user} [default:{default}]")
        return user
    except KeyError:
        return default

BTW, this is the one and only place in borg code that calls getpwuid.

detoxhby commented 3 years ago

are the getpwuid calls really coming from the borg process?

Happens only if borg's running, so I think it's either coming from borg or any companion process borg uses.

ℹ️ I was able to reproduce the exact same behavior using getent or id command, so it's the generic uid resolution which triggers NSS -->Winbind-->Samba chain. (no problem occours, everything is resolved properly; the borg side problem is only the trendemous call count).

Next weeks going to be super busy for us, but I'll try to find a bit time to trace the borg process. If you could find any other processing point where borg directly or indirectly resolving uid/gid that could point out how is borg related to this.

Do you have posix ACLs in there?

Nope, no ACLs on that mount (plain ext4, with just noatime/nodiratime).


To be precise, this is the version installed: https://launchpad.net/~costamagnagianfranco/+archive/ubuntu/borgbackup/+sourcepub/11995318/+listing-archive-extra

biji commented 3 years ago

Calling acl_to_text_np faster?

diff --git a/src/borg/platform/linux.pyx b/src/borg/platform/linux.pyx
index b4991c51..8b8fdaef 100644
--- a/src/borg/platform/linux.pyx
+++ b/src/borg/platform/linux.pyx
@@ -50,6 +50,7 @@ cdef extern from "sys/acl.h":
     int acl_set_fd(int fd, acl_t acl)
     acl_t acl_from_text(const char *buf)
     char *acl_to_text(acl_t acl, ssize_t *len)
+    char *acl_to_text_np(acl_t acl, ssize_t *len_p, int flags)

 cdef extern from "acl/libacl.h":
     int acl_extended_file(const char *path)
@@ -232,6 +233,7 @@ def acl_get(path, item, st, numeric_owner=False, fd=None):
     cdef acl_t access_acl = NULL
     cdef char *default_text = NULL
     cdef char *access_text = NULL
+    cdef int acl_to_text_np_flags = 0

     if stat.S_ISLNK(st.st_mode):
         # symlinks can not have ACLs
@@ -246,6 +248,8 @@ def acl_get(path, item, st, numeric_owner=False, fd=None):
         converter = acl_numeric_ids
     else:
         converter = acl_append_numeric_ids
+        # ACL_TEXT_NUMERIC_IDS
+        acl_to_text_np_flags = 2
     try:
         if fd is not None:
             access_acl = acl_get_fd(fd)
@@ -255,11 +259,11 @@ def acl_get(path, item, st, numeric_owner=False, fd=None):
             # only directories can have a default ACL. there is no fd-based api to get it.
             default_acl = acl_get_file(path, ACL_TYPE_DEFAULT)
         if access_acl:
-            access_text = acl_to_text(access_acl, NULL)
+            access_text = acl_to_text_np(access_acl, NULL, acl_to_text_np_flags)
             if access_text:
                 item['acl_access'] = converter(access_text)
         if default_acl:
-            default_text = acl_to_text(default_acl, NULL)
+            default_text = acl_to_text_np(default_acl, NULL, acl_to_text_np_flags)
             if default_text:
                 item['acl_default'] = converter(default_text)
     finally:
ThomasWaldmann commented 3 years ago

@biji good idea!

Guess the libacl code will internally do id -> name lookups if we do not explicitly request ACL_TEXT_NUMERIC_IDS (and as it seems from this ticket, it does no caching, but does a lot of lookups). There is also another acl_to_any_text call, not sure about the pros and cons.

Guess that fixing this will need looking at the converter code also and at the unit tests (we need to make very sure this does what it should do). Also, this code might be platform dependent, so we might need different code for linux, bsd, macOS, etc.

ThomasWaldmann commented 3 years ago

Until there is a better solution with improved ACL code, I will offer a workaround via #3955.

So, if backup/restore of ACLs is not really required, one will be able to work around this speed issue by using borg create --noacls ....

ThomasWaldmann commented 3 years ago

Until a better solution is available, one can use --noacls (1.1-maint and master branch), PRs #5701 #5702.

ThomasWaldmann commented 3 years ago

I updated the issue topic as this is likely a more general issue.

It was discovered in winbind/nss context as there is much more slowness involved due to network delays, but it likely is also not very efficient for local lookups.

Guess we could try to fix this in 2 ways:

biji commented 3 years ago

Using --numeric-owner might cause issue when restoring to other system which have different id mapping, for example Samba PDC dan its BDC

ThomasWaldmann commented 3 years ago

Sure, stuff like --numeric-owner or --noacls is only for "if you know what you are doing". :-)

detoxhby commented 3 years ago

@ThomasWaldmann thank you for your work! Is there a way to release this as a patch version (e.g. v1.15.1) to have an offical compiled binary?

ThomasWaldmann commented 3 years ago

https://github.com/borgbackup/borg/pull/5718/files I will release 1.1.16 soon, including the --noacl workaround for people who do not need to backup ACLs.

Other than that, there is no real, finished fix for this issue yet.

The patch from @biji needs more work, review and testing. If new mechanisms are introduced, might also need more work and testing for other platforms than Linux (mainly freebsd and macOS iirc) as that code is platform-specific.

ThomasWaldmann commented 3 years ago

https://github.com/borgbackup/borg/releases/tag/1.1.16 borg 1.1.16 released with the --noacls workaround (IF you do not need to backup ACLs)

detoxhby commented 3 years ago

@ThomasWaldmann did the production backup round of all affected systems ... and --noacls works like a charm... smaller datasets that were around 40-50 minutes before went down to 30-40 seconds!!! 💯
(similar reduction ratio were inspected for huge (multi-terabyte) datastores)

I realize the downside of this (not being able to directly restore any data at original place), so I'm looking forward for the real ACL cache implementation for future use.

Thanks again 🤝

ThomasWaldmann commented 3 years ago

Wow, that is a speedup! :-)

Not sure I understand why you can't restore to original place? How is that related to --noacls?

BTW, I thought about the cache: that would be better placed somewhere in winbind/nss, so ALL apps can use that and not everybody needs to reimplement the wheel [cache]. Can you check with them?

detoxhby commented 3 years ago

Not sure I understand why you can't restore to original place? How is that related to --noacls?

The emphasis was on directly as we need to rebuild the whole custom ACL on replaced (sub-)trees compared to just copy the data back inplace. Not a big deal for small number of files, but it's a huge problem for multi-million entry fileshares as enumerating them could take hours.

BTW, I thought about the cache: that would be better placed somewhere in winbind/nss, so ALL apps can use that and not everybody needs to reimplement the wheel [cache]. Can you check with them?

I don't think they will implement such agressive caching as winbind has to be aware of the Directory (AD) changes. Even if they do provide some workaround, it would take years to get them on production systems as full version upgrades occours very rarely (business stability overrides new features; sadly), backports are just no-go for the Samba team (experience their strict release policy in the past years).

To summarize: it is a very specific use-case, not a generic solvable problem by any means. I think borg would benefit from self cache on other ACL-aware systems as well, because it is app-specific requirement caused the huge number of resolutions (which occours only a fraction of times for any other programs).

Of course, it's your decision to measure the resources (time/testing) needed for this change, nobody will force or blame you guys, but would be benefitial on long-term as it widens the supported system configurations by borgbackup.

ThomasWaldmann commented 3 years ago

I guess we rather should fix this in borg 1.2, it's quite likely it needs a bigger / critical change to the code.

Aztorius commented 1 month ago

Just to confirm that on a production server with samba/winbind share of about 7TB of mostly millions of small files sitting on 6 SATA SSD with ZFS strip/mirror (equivalent to RAID 10) without --noacls each backup takes 5 hours (even when no file has changed), but with --noacls I'm going down to 45 minutes (which is what I expected). The server is using a 1GB Ethernet link to the backup server.