Only one directory listing active, the rest queued; pnfsmanager.limits.list-threads broke

onnozweers commented 1 year ago

Dear dCache devs,

I have posted this also on the user forum but there was no reply. However, this issue is affecting our users.

We have configured this value so that 10 directory listings can be processed simultaneously instead of the default 2.

pnfsmanager.limits.list-threads = 10

And this used to work. But now when I look at the list activity, I see only one request active, and the others queued.

[root@dcmain /etc/dcache]# dcache-admin-command PnfsManager 'show list activity'
QUEUED REQUESTS

             SOURCE                           USER         PATH
webdav2880-badger14@webdav2880-badger14Domain buswift      /pnfs/grid.sara.nl/data/swiftbackup/nhlstenden/owncloudeb7                           
webdav2880-badger14@webdav2880-badger14Domain buswift      /pnfs/grid.sara.nl/data/swiftbackup/nhlstenden/owncloudeb7                           
webdav2880-badger14@webdav2880-badger14Domain buswift      /pnfs/grid.sara.nl/data/swiftbackup/nhlstenden/owncloudee4                           
webdav2880-badger14@webdav2880-badger14Domain buswift      /pnfs/grid.sara.nl/data/swiftbackup/nhlstenden/owncloudee4                           
webdav2880-badger14@webdav2880-badger14Domain buswift      /pnfs/grid.sara.nl/data/swiftbackup/avans                                            
 webdav443-seacow14@webdav443-seacow14Domain  tropl1b-proc /pnfs/grid.sara.nl/data/knmi-tropomi/disk/nadc/archive/S5P_ICM_CA_UVN_TRIG/2023/07/07

ACTIVE REQUESTS

             SOURCE                           USER         PATH
webdav2880-badger14@webdav2880-badger14Domain buswift      /pnfs/grid.sara.nl/data/swiftbackup/pbl/owncloud751

I’m afraid this causes problems to our users: see attached graph. We have some extremely large directories (3 million objects), so we absolutely need some parallelism.

Cheers, Onno

onnozweers commented 1 year ago

I upgraded our test server to the latest snapshot, which is 10.0. Did the test there, and it worked:

[root@hedgehog14 /etc/dcache]# grep pnfsmanager layouts/hedgehog14.conf 
[namespaceDomain/pnfsmanager]
pnfsmanager.enable.parallel-listing = true
pnfsmanager.limits.list-threads = 24

[root@hedgehog14 /etc/dcache]# dcache-admin-command PnfsManager 'show list activity -times'
ACTIVE REQUESTS

               SOURCE                             USER ARRIVED                              STARTED                              PATH
webdav2884-hedgehog14@webdav2884-hedgehog14Domain onno 2023-10-12 12:15:28.367 (892 ms ago) 2023-10-12 12:15:28.375 (884 ms ago) /users/onno/disk/largedir
webdav2884-hedgehog14@webdav2884-hedgehog14Domain onno 2023-10-12 12:15:28.374 (885 ms ago) 2023-10-12 12:15:28.376 (883 ms ago) /users/onno/disk/largedir
webdav2884-hedgehog14@webdav2884-hedgehog14Domain onno 2023-10-12 12:15:28.375 (884 ms ago) 2023-10-12 12:15:28.377 (882 ms ago) /users/onno/disk/largedir
webdav2884-hedgehog14@webdav2884-hedgehog14Domain onno 2023-10-12 12:15:28.376 (883 ms ago) 2023-10-12 12:15:28.380 (879 ms ago) /users/onno/disk/largedir
webdav2884-hedgehog14@webdav2884-hedgehog14Domain onno 2023-10-12 12:15:28.380 (880 ms ago) 2023-10-12 12:15:28.382 (878 ms ago) /users/onno/disk/largedir
webdav2884-hedgehog14@webdav2884-hedgehog14Domain onno 2023-10-12 12:15:28.381 (879 ms ago) 2023-10-12 12:15:28.383 (877 ms ago) /users/onno/disk/largedir
webdav2884-hedgehog14@webdav2884-hedgehog14Domain onno 2023-10-12 12:15:28.382 (878 ms ago) 2023-10-12 12:15:28.384 (876 ms ago) /users/onno/disk/largedir
webdav2884-hedgehog14@webdav2884-hedgehog14Domain onno 2023-10-12 12:15:28.384 (876 ms ago) 2023-10-12 12:15:28.387 (873 ms ago) /users/onno/disk/largedir
webdav2884-hedgehog14@webdav2884-hedgehog14Domain onno 2023-10-12 12:15:28.386 (874 ms ago) 2023-10-12 12:15:28.389 (871 ms ago) /users/onno/disk/largedir
webdav2884-hedgehog14@webdav2884-hedgehog14Domain onno 2023-10-12 12:15:28.399 (861 ms ago) 2023-10-12 12:15:28.401 (860 ms ago) /users/onno/disk/largedir

dCache 10.0 seems to "understand" the setting:

[root@hedgehog14 /etc/dcache]# dcache property pnfsmanager.enable.parallel-listing namespaceDomain PnfsManager
true

But the same goes for our production instance 8.2.32:

[root@db1 /etc/dcache/layouts]# dcache property pnfsmanager.enable.parallel-listing namespaceDomain PnfsManager
true

I'm very surprised that in 8.2.32 it doesn't seem to work, while in the master snapshot it does.

We have a single PnfsManager. dcache check-config does not give warnings or errors.

I could set up a test VM with 8.2.32 that we can tweak. It will take some time, but it looks like the most logical next step.

onnozweers commented 1 year ago

Mystery solved!

I noticed the layout file was modified after the domain had been started:

[root@db1 /etc/dcache/layouts]# ps -p 63188 -o lstart
                 STARTED
Fri Sep 22 14:52:36 2023

[root@db1 /etc/dcache/layouts]# ls -l db1.conf
-rw-r--r-- 1 root root 4284 Sep 22 15:27 db1.conf

So I restarted the domain, and now it works as expected:

[root@dcmain ~]# dcache-admin-command PnfsManager 'show list activity -times'
ACTIVE REQUESTS

              SOURCE                            USER ARRIVED                               STARTED                               PATH
webdav2884-penguin12@webdav2884-penguin12Domain onno 2023-10-12 14:01:18.554 (1599 ms ago) 2023-10-12 14:01:18.554 (1599 ms ago) /pnfs/grid.sara.nl/data/users/onno/disk/largedir
webdav2884-penguin12@webdav2884-penguin12Domain onno 2023-10-12 14:01:18.554 (1599 ms ago) 2023-10-12 14:01:18.555 (1598 ms ago) /pnfs/grid.sara.nl/data/users/onno/disk/largedir
webdav2884-penguin12@webdav2884-penguin12Domain onno 2023-10-12 14:01:18.554 (1599 ms ago) 2023-10-12 14:01:18.556 (1597 ms ago) /pnfs/grid.sara.nl/data/users/onno/disk/largedir
webdav2884-penguin12@webdav2884-penguin12Domain onno 2023-10-12 14:01:18.554 (1600 ms ago) 2023-10-12 14:01:18.556 (1598 ms ago) /pnfs/grid.sara.nl/data/users/onno/disk/largedir
webdav2884-penguin12@webdav2884-penguin12Domain onno 2023-10-12 14:01:18.556 (1598 ms ago) 2023-10-12 14:01:18.557 (1597 ms ago) /pnfs/grid.sara.nl/data/users/onno/disk/largedir
webdav2884-penguin12@webdav2884-penguin12Domain onno 2023-10-12 14:01:18.556 (1598 ms ago) 2023-10-12 14:01:18.558 (1596 ms ago) /pnfs/grid.sara.nl/data/users/onno/disk/largedir
webdav2884-penguin12@webdav2884-penguin12Domain onno 2023-10-12 14:01:18.556 (1598 ms ago) 2023-10-12 14:01:18.560 (1594 ms ago) /pnfs/grid.sara.nl/data/users/onno/disk/largedir
webdav2884-penguin12@webdav2884-penguin12Domain onno 2023-10-12 14:01:18.558 (1596 ms ago) 2023-10-12 14:01:18.560 (1594 ms ago) /pnfs/grid.sara.nl/data/users/onno/disk/largedir
webdav2884-penguin12@webdav2884-penguin12Domain onno 2023-10-12 14:01:18.558 (1596 ms ago) 2023-10-12 14:01:18.561 (1593 ms ago) /pnfs/grid.sara.nl/data/users/onno/disk/largedir
webdav2884-penguin12@webdav2884-penguin12Domain onno 2023-10-12 14:01:18.561 (1593 ms ago) 2023-10-12 14:01:18.563 (1591 ms ago) /pnfs/grid.sara.nl/data/users/onno/disk/largedir

My apologies, I should have checked this earlier! 😅😅😅

Cheers, Onno

DmitryLitvintsev commented 1 year ago

Oh great. This issue can be resolved then? I think "info -a" needs to print all PnfsManager states to remove guess work.

onnozweers commented 1 year ago

This issue can be resolved then?

I'm not sure. For me, it can. But this issue contains also a discussion about optimizing the WebDAV listings; has that discussion finished?

I think "info -a" needs to print all PnfsManager states to remove guess work.

That would help!

DmitryLitvintsev commented 1 year ago

Discussion about optimizing the WebDAV listings is unfinished as long as there are a couple of patches in review that are not pushed/merged yet and even then, there needs to be some work done on top of it. But in essence the discussion seems to be finished as long as we seem to have arrived to common agreement:

PROPFIND call needs to return what user requests, or, if no specific list of properties is provided, return a minimal set that does not include file locality, checksums and any other info that is not contained in t_inodes
gfal-ls asks for so called "quota" information which we "mimicked" by querying SpaceManager based on directory tag. Not all sites use SpaceManager, and, generally, users of gfal-ls do not need this info. So there seems to be consensus to skip that query based on some internal dCache variable. This one is especially "painful" when you have a directory with a lot of subdirectories.

OK, lets keep this one open until progress in listing and "info -a" is made.

onnozweers commented 1 year ago

We've just experienced that Rclone can DoS dCache by writing in parallel to large destination dirs. This is a bit related to this ticket so I thought it might be good to mention it here.

The thing is, that Rclone by default lists the destination dir, even if that dir is very large and you only want to add a single file. If you have many jobs that write to dCache in this way, this may cause congestion for all users.

Fortunately, you can switch the destination listings off!

See the --no-traverse option for controlling whether rclone lists the destination directory or not. Supplying this option when copying a small number of files into a large destination can speed transfers up greatly.

https://rclone.org/commands/rclone_copy/

DmitryLitvintsev commented 1 year ago

And what you have seen - you used up all available threads and PnfsManager was done after that. This is precisely what that "new algo" avoids (not 100%, but still).

dCache / dcache

Only one directory listing active, the rest queued; pnfsmanager.limits.list-threads broke #7252