dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
291 stars 136 forks source link

Only one directory listing active, the rest queued; pnfsmanager.limits.list-threads broke #7252

Open onnozweers opened 1 year ago

onnozweers commented 1 year ago

Dear dCache devs,

I have posted this also on the user forum but there was no reply. However, this issue is affecting our users.

We have configured this value so that 10 directory listings can be processed simultaneously instead of the default 2.

pnfsmanager.limits.list-threads = 10

And this used to work. But now when I look at the list activity, I see only one request active, and the others queued.

[root@dcmain /etc/dcache]# dcache-admin-command PnfsManager 'show list activity'
QUEUED REQUESTS

             SOURCE                           USER         PATH
webdav2880-badger14@webdav2880-badger14Domain buswift      /pnfs/grid.sara.nl/data/swiftbackup/nhlstenden/owncloudeb7                           
webdav2880-badger14@webdav2880-badger14Domain buswift      /pnfs/grid.sara.nl/data/swiftbackup/nhlstenden/owncloudeb7                           
webdav2880-badger14@webdav2880-badger14Domain buswift      /pnfs/grid.sara.nl/data/swiftbackup/nhlstenden/owncloudee4                           
webdav2880-badger14@webdav2880-badger14Domain buswift      /pnfs/grid.sara.nl/data/swiftbackup/nhlstenden/owncloudee4                           
webdav2880-badger14@webdav2880-badger14Domain buswift      /pnfs/grid.sara.nl/data/swiftbackup/avans                                            
 webdav443-seacow14@webdav443-seacow14Domain  tropl1b-proc /pnfs/grid.sara.nl/data/knmi-tropomi/disk/nadc/archive/S5P_ICM_CA_UVN_TRIG/2023/07/07

ACTIVE REQUESTS

             SOURCE                           USER         PATH
webdav2880-badger14@webdav2880-badger14Domain buswift      /pnfs/grid.sara.nl/data/swiftbackup/pbl/owncloud751

I’m afraid this causes problems to our users: see attached graph. We have some extremely large directories (3 million objects), so we absolutely need some parallelism.

afbeelding

Cheers, Onno

onnozweers commented 1 year ago

We're running 8.2.24.

onnozweers commented 1 year ago

Looking at the graphs, the queueing seems to have started when we upgraded from 8.2.16 to 8.2.24, at the end of June. Java remained at version 11.0.18.

afbeelding
lemora commented 1 year ago

This looks like it could be related to a8a6912fdb @DmitryLitvintsev . The commit was added with 8.2.23.

DmitryLitvintsev commented 1 year ago

Hi Onno,

Does it cause problems to the users or you think is is causing problems?

Here is the description of the patch

    pnfsmanager: process listings of the same path sequentially, implement list 

    Motivation:

    Multiple batch jobs executing listings of the same directories especially
    if directories contain many entries quickly exhaust active threads, each thr
    executing the same listing, denying listing to all other customers and
    creating significant load on namespace server (and DB). Clients time out
    and retry creating even more load on dCache.

    Modification:

    The problem is solved by implementing PnfsListDirectoryMessage folding
    and dispatching list requests to multiple queues (number of queues is
    configurable) based on directory path hashCode. Each queue is served by a si
    thread. This guarantees that a listing of the same path will be queued to
    the same queue and only one listing of a unique path can be active.
    Once listing is complete, the queue is scanned for all other requests for
    the same path and these messages are folded (that is their payload is
    populated from just completed listing) and routed back to sender without
    ever hitting the database.

    Ancillary changes include - instead of hardcoded time in the future when
    quota and FS stat scan start, use already available settings for durations
    between run. This is to avoid triggering these scans when PnfsManager is
    restarted often (when testing for instance). And a few "final" qualifiers ad
    where suggested by IDE.

    Result:

    Lists of large directories no longer hog PnfsManage denying service to all
    other list calls. Additionally observed significnat reduction of CPU and
    IO load on chimera DB host. Patch has been running in production at Fermilab
    and has shown to solve the issue of listings timing out.

    patch: https://rb.dcache.org/r/13992/

The way it works is when you have an active listing and and another listing for the same directory queued - the queued ones get result of the active once it has completed. So there is parallelism.

It works very well at Fermilab and in fact fixed a lot of our listing hangs (when multiple large directories are listed at the same time)

I use:

pnfsmanager.limits.list-threads = 33

Try that :).

DmitryLitvintsev commented 1 year ago

This is how this typically looks on our system:

QUEUED REQUESTS

                    SOURCE                          USER           PATH
 GFTP3-fndca4b-AAYBUPdNoBg@gridftp3-fndca4bDomain   novapro        /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox                                 
 GFTP9-fndca4b-AAYBUPddRAA@gridftp9-fndca4bDomain   novapro        /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox                                 
GFTP10-fndca4b-AAYBUPdf58g@gridftp10-fndca4bDomain  novapro        /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox                                 
 GFTP4-fndca4b-AAYBUPdxlyA@gridftp4-fndca4bDomain   novapro        /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox/0/1/1                           
          Xrootd01-fndca4b@xrootd01-fndca4bDomain   <unknown>      /pnfs/fnal.gov/usr/uboone/persistent/stash/uboonebeam/beam_data/bnb_data/00/02/44/83          
 GFTP1-fndca4b-AAYBUPeJhCA@gridftp1-fndca4bDomain   novapro        /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox                                 
 GFTP4-fndca4b-AAYBUPeOv_g@gridftp4-fndca4bDomain   novapro        /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox                                 
 GFTP9-fndca4b-AAYBUPeY9UA@gridftp9-fndca4bDomain   novapro        /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox                                 
 GFTP8-fndca4b-AAYBUPe9HxA@gridftp8-fndca4bDomain   novapro        /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox                                 
 GFTP4-fndca4b-AAYBUPgBhsg@gridftp4-fndca4bDomain   novapro        /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox                                 
 GFTP8-fndca4b-AAYBUPgmmvg@gridftp8-fndca4bDomain   novapro        /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox                                 
 GFTP0-fndca4b-AAYBUPgwP7g@gridftp0-fndca4bDomain   novapro        /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox                                 
 GFTP3-fndca4b-AAYBUPhFMwA@gridftp3-fndca4bDomain   novapro        /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox                                 
 GFTP6-fndca4b-AAYBUPhFrBg@gridftp6-fndca4bDomain   novapro        /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox                                 
 GFTP5-fndca4b-AAYBUPhK4CA@gridftp5-fndca4bDomain   novapro        /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox                                 
 GFTP6-fndca4b-AAYBUPhPHhA@gridftp6-fndca4bDomain   novapro        /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox                                 
 GFTP8-fndca4b-AAYBUPhkFUA@gridftp8-fndca4bDomain   novapro        /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox                                 
          Xrootd01-fndca4b@xrootd01-fndca4bDomain   <unknown>      /pnfs/fnal.gov/usr/minerva/persistent/stash/pidDB/data/00016730                               
      WebDAV00-fndcadoor02@webdav-fndcadoor02Domain uboonepro      /pnfs/fnal.gov/usr/uboone/scratch/uboonepro/dropbox/merge/data/uboone/root-tuple/38           
      WebDAV00-fndcadoor03@webdav-fndcadoor03Domain uboonepro      /pnfs/fnal.gov/usr/uboone/scratch/uboonepro/dropbox/merge/overlay/uboone/root-tuple/2         

ACTIVE REQUESTS

                    SOURCE                          USER           PATH
          WebDAV-fndca4b-2@webdavDomain1            131.225.240.62 /pnfs/fnal.gov/usr/mu2e/resilient/jobsub_stage                                                
      WebDAV00-fndcadoor02@webdav-fndcadoor02Domain mnieslon       /pnfs/fnal.gov/usr/annie/persistent/users/moflaher/g4dirt_vincentsgenie/BNB_Water_10k_22-05-17
      WebDAV00-fndcadoor02@webdav-fndcadoor02Domain uboonepro      /pnfs/fnal.gov/usr/uboone/scratch/uboonepro/dropbox/merge/data/uboone/root-tuple/68

In the above:

listing of /pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/nova/scratch/fts/dropbox is queued but once first of them completes - all of them complete immediately.

Another comment. The patch did not make it so that only one listing is running. The algorithm is as follows. There is a list of threads and the listing of path is dispatched to a thread number defined by:

N = hashcode(path) % pnfsmanager.limits.list-threads

So it is not that there is always just one listing thread running. Yes, there could be situations where large directory has the same hashcode as small directory, and in this case small directory would have to wait for large to be completed. But as I said we have seen very tangible improvement at Fermilab.

If this continue to give you trouble we may make this functionality optional (yet another variable)

Also... one thing to note. I have been doing the same kind of watching of "show list activity" and I must say that this is a little deceptive. Particularly that ACTIVE listing. I have noticed that you only "catch" listings in ACTIVE if they last a few seconds.

E.g. if you think there is only one listing running - try timing a listing while you see one ACTIVE. In my experience you get result right away and never see it being active. You need to use any protocol other than NFS that directly queries the database bypassing PnfsManager (I casually use uberftp ls for this sort of stuff).

P.S.: we generally see more queueing than before but we do not see one user causing DOS by running multiple ls commands (via batch jobs) on large directories (those would starve all available allowed threads). Behavior now - if such a user hits the system his ls requests will queue up until first completes meanwhile giving other user ls requests on other directories a chance to run simultaneously (on paths whose hashcode != hashcode of that users's directory). Once his requests completes all his queued ls requests complete without hitting the database with the result of just completed request. And on that goes. Anyhow, I would like to hear what your users experience. The ever present ls queue should not be a problem. At least it is not a problem for us. We do not have millions of entries per directory. Max about 200K. So I am eager to hear if there are side effects.

onnozweers commented 1 year ago

[I removed this comment because of the next comment]

onnozweers commented 1 year ago

I missed this part:

Another comment. The patch did not make it so that only one listing is running. The algorithm is as follows. There is a list of threads and the listing of path is dispatched to a thread number defined by:

N = hashcode(path) % pnfsmanager.limits.list-threads

So, it is possible that when there is a listing that lasts an hour, all listings assigned to the same thread have to wait, while other threads are doing nothing? Actually this does not make sense to me.

DmitryLitvintsev commented 1 year ago

This is much better than all threads busy processing N listings of 1 hour not letting anybody in at al. (this is what we had to deal with at Fermilab and hence the patch). With this patch others have a good chance and only 1 out of N long list queries will be actually run on DB. "Things are better than they look"

Besides you observing queuing in your monitoring, do you have customers complaining. This is what the important bit is.

Other things we use:

pnfsmanager.limits.list-chunk-size = 1000

(I think default is 100)

And raise number of list threads higher.

P.S.: Listing query lasting longer than PnfsManager timeout would trigger timeout error in directory list handler (so 1 hour listing is not really possible)

onnozweers commented 1 year ago

Users haven't complained yet, but I'm sure they will if this continues. I added this metric to our monitoring because users complained before (when we had 2 threads, the default).

I would be very happy if this were configurable, so that we can return to the old algorithm.

DmitryLitvintsev commented 1 year ago

Well.... give it a try with the increased list thread count and chunk size. My bet - they won't complain. Based on our experience.

If they don't complain - we won't make it optional :)

onnozweers commented 1 year ago

Sorry, whatever tuning may bring, I still think it's ugly. You have turned directory listing performance into a game of Russian roulette.

DmitryLitvintsev commented 1 year ago

Hi Onno,

I respect your opinion, but it works much better for us. We would have PnfsManager DOS weekly. Since we had the patch deployed - we never had any issues. So give it a try first.

BTW: just to add to it - GetFileAtrtributes messages have been processed exactly like that for many years. I just extended the same approach to listing.

Dmitry

onnozweers commented 1 year ago

We increased the threads from 10 to 100 and the chunk size from 100 to 10,000. We've also tried chunk size 1,000, and 10,000 does not seem to perform different than 1,000, so we'll probably revert to 1,000 during next maintenance.

We see that the maximum waiting time during peak hour is reduced from ~50 minutes to ~15 minutes. The srmls timeout is 2 minutes though, so we're not out of trouble yet.

Now waiting to hear from users.

DmitryLitvintsev commented 1 year ago

I think 10K chunk size is a little excessive. How is the memory footprint of the PnfsManager. The numbers you quoted ~ 50 minutes seems staggering to me.

Could you post:

"\sn show list activity -times"

and

"\sn info -a"

output. And, make sure this variable is not set to false:

pnfsmanager.enable.folding

(default is true). Folding is absolutely crucial here.

Also, the 50 minute wait. Do I get it correct you have 1M entries in directory and lists for that directory (and all the unfortunate ones that have the same hashcode % number_of_threads value) queue up?

And in the past having just 2 list threads you had no issues? Of course if your experience is so drastically worse than before I can make the functionality optional, or consider improvements to the algorithm.

The "info -a" output @ Fermilab public dCache system:

Statistics:
PnfsManagerV3                             average±stderr(ms)      min(ms)      max(ms)      STD(ms)      Samples       Period
  PnfsListDirectoryMessage                 275.90±      0.32            1      127,943     1,314.32   17,254,002      12 days
...

As you can see max about 2 minutes. And ..... I think anything longer would just fail.

and folding stats

PnfsManagerV3.Folded                  requests    failed
  PnfsGetFileAttributes                6392185         0
  PnfsListDirectoryMessage             2779888         0
  Total                                9172073         0

Whereas CMS T1 instance :

Statistics:
PnfsManagerV3                             average±stderr(ms)      min(ms)      max(ms)      STD(ms)      Samples       Period
  PnfsListDirectoryMessage                   4.71±      0.01            1       10,395        24.92   15,701,235      35 days

PnfsManagerV3.Folded                  requests    failed
  PnfsGetFileAttributes                 140852         0
  PnfsListDirectoryMessage                   3         0
  Total                                 140855         0

Typically I see no activity :

[ssh -p 22224 admin@cmsdcadiskhead "\sn show list activity -times"
Currently no list activity
onnozweers commented 1 year ago

The 3 million object dir listing usually happens only a few times per day. It's a backup from an object store. That explains why it's so many objects in a single dir: object stores usually don't care about a directory structure.

Indeed, all dir listings that were so unlucky to be assigned to the same thread, will have to wait for that listing to complete. Even with 100 threads, I can see around ~10 listings being queued.

It's a long time ago that we had the default of 2 threads. And yes, we had problems back then, that's why we increased it to 10. With 10, we had no problems, until the algorithm change in 8.2.23.

Srmls has a timeout of 2 minutes, but WebDAV operations don't. The 3 million objects listing is done through WebDAV. We've had to set up a special WebDAV door with a lot of memory, because otherwise it would OOM while providing the dir listing.

dcache.java.memory.heap=250G
dcache.java.memory.direct=100G

I'm aware we're pushing the boundaries here. 😉

The PnfsManager up to now had 8GB of heap, we've just increased it to 50 GB. (No OOM here though; just trying to avoid garbage collection)

The show list activity -times currently is empty, I'll try again during "rush hour".

[root@dcmain ~]# dcache-admin-command PnfsManager 'info -a'
--- ha-service-leadership-manager (Coordinates which pnfsmanager updates the filesystem stat cache) ---
HA role: LEADER

--- name-space-provider (Name space provider) ---
Acl Enabled: true
DB        : HikariDataSource (HikariPool-1)
DB Engine : PostgreSQL 11.20
rootID    : 000000000000000000000000000000000000
FsId      : 0
Statistics:
ChimeraNameSpaceProvider                  average±stderr(ms)      min(ms)      max(ms)      STD(ms)      Samples       Period
  addInodeLocation(FsInode|int|Strin         0.10±      0.00            0           60         0.31    2,648,476     28 hours
  clearInodeLocation(FsInode|int|Str         0.07±      0.00            0          108         0.29    1,986,657     28 hours
  createFile(FsInode|String|int|int|         0.59±      0.00            0          616         0.88    4,394,414     28 hours
  find(FsInode)                              0.03±      0.00            0          148         0.19   85,238,126     28 hours
  getACL(FsInode)                            0.03±      0.00            0          136         0.18  221,302,823     28 hours
  getAllTags(FsInode)                        0.05±      0.00            0          339         0.23   72,274,940     28 hours
  getInfo()                                  0.01±      0.00            0           10         0.15       11,946     28 hours
  getInodeChecksums(FsInode)                 0.04±      0.00            0          185         0.19   72,292,628     28 hours
  getInodeLocations(FsInode)                 0.04±      0.00            0          171         0.21   69,568,667     28 hours
  getInodeLocations(FsInode|int)             0.29±      0.13            0            1         0.47           14     26 hours
  getLabels(FsInode)                         0.16±      0.01            0            1         0.37        1,638     28 hours
  getStorageInfo(FsInode)                    0.04±      0.00            0           15         0.19    1,959,977     28 hours
  getXattr(FsInode|String)                   0.05±      0.00            0          219         0.23   51,344,696     28 hours
  id2inode(String|StatCacheOption)           0.04±      0.00            0          123         0.21   57,420,721     28 hours
  inode2id(FsInode)                          0.00±      0.00            0           13         0.04   18,723,880     28 hours
  inode2path(FsInode)                        0.13±      0.00            0           15         0.34    5,047,577     28 hours
  inodeOf(FsInode|String|StatCacheOp         0.07±      0.00            0          185         0.28    2,240,736     28 hours
  listXattrs(FsInode)                        0.05±      0.00            0          186         0.23   81,679,273     28 hours
  mkdir(FsInode|String|int|int|int)          0.49±      0.00            0            9         0.60       18,719     28 hours
  mkdir(FsInode|String|int|int|int|L         2.53±      0.09            0           30         2.69        1,000     28 hours
  newDirectoryStream(FsInode)               14.06±      0.54            0        7,718        97.65       32,246     28 hours
  path2inode(String)                         0.21±      0.00            0           63         0.42    4,918,729     28 hours
  path2inodes(String)                        0.30±      0.00            0          188         0.50   30,173,679     28 hours
  read(FsInode|int|long|byte[]|int|i         0.05±      0.00            0            1         0.21        5,711     28 hours
  remove(FsInode|String|FsInode)             0.69±      0.00            0          409         1.61    1,982,024     28 hours
  rename(FsInode|FsInode|String|FsIn         0.21±      0.00            0            7         0.42       38,622     28 hours
  resolvePath(String)                        0.29±      0.00            0          190         0.51   12,032,066     28 hours
  setInodeAttributes(FsInode|int|Sta         0.12±      0.00            0           54         0.35    2,635,645     28 hours
  setInodeChecksum(FsInode|int|Strin         0.06±      0.00            0           38         0.25    2,636,635     28 hours
  setStorageInfo(FsInode|InodeStorag         0.36±      0.02            0            1         0.48          480     28 hours
  setXattr(FsInode|String|byte[]|Set         0.13±      0.00            0           93         0.35    2,553,683     28 hours
  stat(FsInode|int)                          0.05±      0.00            0           75         0.23    8,276,882     28 hours
  updateFsStat()                        37,629.11±    718.01       35,251       51,974     3,799.34           28     27 hours
ChimeraNameSpaceProvider              requests    failed
  addInodeLocation(FsInode|int|Strin   2648476        34
  clearInodeLocation(FsInode|int|Str   1986657         0
  createFile(FsInode|String|int|int|   4394414   1758952
  find(FsInode)                       85238126         0
  getACL(FsInode)                    221302823         0
  getAllTags(FsInode)                 72274940         0
  getInfo()                              11946         0
  getInodeChecksums(FsInode)          72292628         0
  getInodeLocations(FsInode)          69568667         0
  getInodeLocations(FsInode|int)            14         0
  getLabels(FsInode)                      1638         0
  getStorageInfo(FsInode)              1959977         0
  getXattr(FsInode|String)            51344696         0
  id2inode(String|StatCacheOption)    57420721   1923593
  inode2id(FsInode)                   18723880         0
  inode2path(FsInode)                  5047577         0
  inodeOf(FsInode|String|StatCacheOp   2240736    257555
  listXattrs(FsInode)                 81679273         0
  mkdir(FsInode|String|int|int|int)      18719     10792
  mkdir(FsInode|String|int|int|int|L      1000         0
  newDirectoryStream(FsInode)            32246         0
  path2inode(String)                   4918729    217914
  path2inodes(String)                 30173679   2144632
  read(FsInode|int|long|byte[]|int|i      5711         0
  remove(FsInode|String|FsInode)       1982024         1
  rename(FsInode|FsInode|String|FsIn     38622         0
  resolvePath(String)                 12032066         0
  setInodeAttributes(FsInode|int|Sta   2635645       116
  setInodeChecksum(FsInode|int|Strin   2636635         0
  setStorageInfo(FsInode|InodeStorag       480         0
  setXattr(FsInode|String|byte[]|Set   2553683         0
  stat(FsInode|int)                    8276882   5618667
  updateFsStat()                            28         0
  Total                              813443338  11932256

--- pnfs-manager-ionotify ---
atime precision: Disabled

List queue: 0

Threads (24) Queue
    [0] 0
    [1] 0
    [2] 0
    [3] 0
    [4] 0
    [5] 0
    [6] 0
    [7] 0
    [8] 0
    [9] 0
    [10] 0
    [11] 0
    [12] 0
    [13] 0
    [14] 0
    [15] 0
    [16] 0
    [17] 0
    [18] 0
    [19] 0
    [20] 0
    [21] 0
    [22] 0
    [23] 0

Threads: 0

Statistics:
PnfsManagerV3                             average±stderr(ms)      min(ms)      max(ms)      STD(ms)      Samples       Period
  PnfsAddCacheLocationMessage                0.38±      0.02            0            4         0.52          915     28 hours
  PnfsCancelUpload                           3.05±      0.50            1           28         3.73           55     28 hours
  PnfsClearCacheLocationMessage              0.21±      0.00            0          109         0.50    1,986,987     28 hours
  PnfsCommitUpload                           2.87±      0.05            2           23         1.46          945     28 hours
  PnfsCreateEntryMessage                     1.65±      0.00            0          641         1.41    4,421,978     28 hours
  PnfsCreateSymLinkMessage                    NaN±       NaN            0            0          NaN            0     28 hours
  PnfsCreateUploadPath                       3.86±      0.09            1           32         3.00        1,000     28 hours
  PnfsDeleteEntryMessage                     1.71±      0.00            0          432         2.07    2,200,809     28 hours
  PnfsFlagMessage                             NaN±       NaN            0            0          NaN            0     28 hours
  PnfsGetCacheLocationsMessage                NaN±       NaN            0            0          NaN            0     28 hours
  PnfsGetFileAttributes                      0.71±      0.00            0          189         0.81   30,810,075     28 hours
  PnfsGetParentMessage                        NaN±       NaN            0            0          NaN            0     28 hours
  PnfsListDirectoryMessage                 591.89±     64.05            1      967,950    11,537.54       32,448     28 hours
  PnfsListExtendedAttributesMessage           NaN±       NaN            0            0          NaN            0     28 hours
  PnfsMapPathMessage                         0.25±      0.00            0           18         0.43    2,432,662     28 hours
  PnfsReadExtendedAttributesMessage           NaN±       NaN            0            0          NaN            0     28 hours
  PnfsRemoveChecksumMessage                   NaN±       NaN            0            0          NaN            0     28 hours
  PnfsRemoveExtendedAttributesMessag          NaN±       NaN            0            0          NaN            0     28 hours
  PnfsRemoveLabelsMessage                     NaN±       NaN            0            0          NaN            0     28 hours
  PnfsRenameMessage                          1.72±      0.00            0           60         0.97       37,677     28 hours
  PnfsSetFileAttributes                      0.58±      0.00            0          363         0.92    2,647,360     28 hours
  PnfsWriteExtendedAttributesMessage          NaN±       NaN            0            0          NaN            0     28 hours
  PoolFileFlushedMessage                     0.83±      0.04            0           16         0.86          480     28 hours
PnfsManagerV3.Folded                  requests    failed
  PnfsGetFileAttributes                 176923         0
  PnfsListDirectoryMessage                  38         0
  Total                                 176961         0
DmitryLitvintsev commented 1 year ago

Max 15 minutes for directory listing from above. Very little folding, meaning the same directory is not being hit often enough to trigger folding. OK, Lets wait for rush hour data.

P.S.: srmls timeout 2 minute. What are those? Are they "gfal-ls" executed by some SAM tests? And 2 minute is "their", that is client timeout? Or is it our internal timeout:

srmmanager.service.pnfsmanager.timeout = 120

I believe that when this one is hit srmls fails the request but of course PnfsManager still keeps processing it.

So I suggest to raise the above timeout to 15 minutes (based on your "info -a" output)

similarly for webdav:

webdav.service.pnfsmanager.timeout = 120000

(mind the units). Same 2 minute timeout. Probably needs to be raised to 15 minutes. as well.

onnozweers commented 1 year ago

I missed the rush hour, will try again tomorrow.

But saw a large spike in our listing queue graphs. It appears we had a listing that lasted 4028 seconds. I didn't expect that. In the end, 18 other listings were waiting for this one to finish.

[root@install-fes ~]# dcache-admin-command PnfsManager 'info -a' | grep PnfsListDirectoryMessage
  PnfsListDirectoryMessage                 708.14±     81.45            1    4,028,875    20,249.28       61,801       2 days
  PnfsListDirectoryMessage                  74         0
onnozweers commented 1 year ago
image
DmitryLitvintsev commented 1 year ago

Question. Can you identify the large directory that takes 4K seconds to process and see what protocol has issued the listing (srm, webdav?) and that this listing indeed succeeded and did not fail with timeout. Also, has the queue ever contain the same large directory (which could be an indication of retry by the door).

By now I realize that your usage pattern is different - occasionally or periodically your system runs a long listing and with the new algorithm it backs few unfortunate others up while it is running. This is opposed to our use case when many users run listing on the same, sometimes largish directories at the same time. We typically ask users to keep entries under 10K. There are few cases of ~ up to 200K but never millions.

I think this calls for making the algorithm optional for now.

At the same time I can see if I can make it more sophisticated.

NB: previous algorithm would fail if you gave 10 such queries running, all 10 threads will be locked for 4K seconds not allowing any other listings to go through. Increasing number of listing threads would kill the DB. This is basically what we had run into.

onnozweers commented 1 year ago

I think we have a very good informative listing here. I ran the command in a loop every minute for some hours. This is the last listing with a very long listing operation, right before it completed and the queue was drained. When finally the very large dir had finished, all 22 waiting list operations completed within one minute (because the next listing a minute later was completely empty).

Thu Aug  3 11:40:19 CEST 2023
QUEUED REQUESTS

             SOURCE                           USER    ARRIVED                              STARTED                              PATH
         SrmManager@srmmanagerDomain          lofops  2023-08-03 10:35:57.209 (64 min ago)                                      /pnfs/grid.sara.nl/data/lofar/ops/projects/lc0_003/151812   
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 10:57:58.960 (42 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/rivm/owncloudc02        
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 10:59:59.006 (40 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/rivm/owncloudc02        
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:00:00.537 (40 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/uva/owncloudf70         
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:02:00.568 (38 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/uva/owncloudf70         
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:03:34.951 (36 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/amsterdamumc/owncloud18c
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:05:34.977 (34 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/amsterdamumc/owncloud18c
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:07:08.401 (33 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/nhlstenden/owncloud9ad  
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:08:04.911 (32 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/amsterdamumc/owncloud3f3
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:09:08.467 (31 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/nhlstenden/owncloud9ad  
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:10:05.194 (30 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/amsterdamumc/owncloud3f3
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:11:17.583 (29 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/nhlstenden/owncloudacb  
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:13:17.676 (27 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/nhlstenden/owncloudacb  
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:18:59.372 (21 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/buas/owncloud21c        
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:20:55.467 (19 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/zuyd/owncloude13        
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:20:59.534 (19 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/buas/owncloud21c        
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:22:40.392 (17 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/che/owncloud894         
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:22:55.520 (17 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/zuyd/owncloude13        
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:24:40.459 (15 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/che/owncloud894         
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:26:54.862 (13 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/hvhl/owncloud593        
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 11:28:54.906 (11 min ago)                                      /pnfs/grid.sara.nl/data/swiftbackup/hvhl/owncloud593        
         SrmManager@srmmanagerDomain          lofops  2023-08-03 11:30:59.602 (9 min ago)                                       /pnfs/grid.sara.nl/data/lofar/ops/projects/lc0_003/183367   

ACTIVE REQUESTS

             SOURCE                           USER    ARRIVED                              STARTED                              PATH
webdav2880-badger14@webdav2880-badger14Domain buswift 2023-08-03 10:33:17.862 (67 min ago) 2023-08-03 10:33:17.862 (67 min ago) /pnfs/grid.sara.nl/data/swiftbackup/pbl/owncloud751
onnozweers commented 1 year ago

Here's the associated graph:

image
DmitryLitvintsev commented 1 year ago

Does /pnfs/grid.sara.nl/data/swiftbackup/pbl/owncloud751 have millions of entries? And do you have a way of finding our that the listing actually succeeded? (see my previous message). Before I move on to touch the code I wanted to make sure that we are not seeing PnfsManager just spinning its wheels? I think WebDAV door access file would have an answer (if listing completed or timed out).

(The more I think about a listing that lasts > 1 hour, the less I am convinced that it is actually needed and could be an artifact of some other activity, like inadvertent "ls" whose only purpose is establishing that destination directory exists. We have seen something like that at Fermilab).

onnozweers commented 1 year ago

I can see that the dir listing gets an OOM error. It's already running with these settings (note the heap & direct memory, we've had to increase that several times in the past):

root     3152466       1 28 05:43 ?        01:23:22 /usr/bin/java -server -Xmx250G -XX:MaxDirectMemorySize=100G -Dsun.net.inetaddr.ttl=1800 -Dorg.globus.tcp.port.range=20000,25000 -Dorg.dcache.dcap.port=0 -Dorg.dcache.ftp.log-aborted-transfers=true -Dorg.dcache.net.tcp.portrange=33000:34500 -Djava.security.krb5.realm= -Djava.security.krb5.kdc= -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.auth.login.config=/etc/dcache/jgss.conf -Dcontent.types.user.table=/etc/dcache/content-types.properties -Dzookeeper.sasl.client=false -Dcurator-dont-log-connection-problems=true -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/webdav2880-badger14Domain-oom.hprof -XX:+ExitOnOutOfMemoryError -XX:+StartAttachListener -XX:+UseCompressedOops -javaagent:/usr/share/dcache/classes/aspectjweaver-1.9.2.jar -Djava.net.preferIPv6Addresses=system -Djava.awt.headless=true -DwantLog4jSetup=n -Dorg.bouncycastle.dh.allow_unsafe_p_value=true -Djdk.tls.ephemeralDHKeySize=2048 -Djava.net.preferIPv6Addresses=false -Ddcache.home=/usr/share/dcache -Ddcache.paths.defaults=/usr/share/dcache/defaults org.dcache.boot.BootLoader start webdav2880-badger14Domain

Here's the error:

07 Aug 2023 10:32:52 (webdav2880-badger14) [] /pnfs/grid.sara.nl/data/swiftbackup/pbl/owncloud751
java.lang.OutOfMemoryError: null
    at java.base/java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:125)
    at java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:119)
    at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
    at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156)
    at java.base/java.io.OutputStream.write(OutputStream.java:122)
    at io.milton.http.XmlWriter.append(XmlWriter.java:73)
    at io.milton.http.XmlWriter.writeElement(XmlWriter.java:421)
    at io.milton.http.XmlWriter.writeElement(XmlWriter.java:127)
    at io.milton.http.XmlWriter.writeProperty(XmlWriter.java:114)
    at io.milton.http.values.ToStringValueWriter.writeValue(ToStringValueWriter.java:52)
    at io.milton.http.values.ValueWriters.writeValue(ValueWriters.java:98)
    at io.milton.http.webdav.PropFindXmlGeneratorHelper.sendProperties(PropFindXmlGeneratorHelper.java:124)
    at io.milton.http.webdav.PropFindXmlGeneratorHelper.sendKnownProperties(PropFindXmlGeneratorHelper.java:114)
    at io.milton.http.webdav.PropFindXmlGeneratorHelper.appendResponse(PropFindXmlGeneratorHelper.java:102)
    at io.milton.http.webdav.PropFindXmlGeneratorHelper.appendResponses(PropFindXmlGeneratorHelper.java:91)
    at io.milton.http.webdav.PropFindXmlGenerator.generate(PropFindXmlGenerator.java:60)
    at io.milton.http.webdav.PropFindXmlGenerator.generate(PropFindXmlGenerator.java:69)
    at io.milton.http.webdav.DefaultWebDavResponseHandler.respondPropFind(DefaultWebDavResponseHandler.java:202)
    at io.milton.http.AbstractWrappingResponseHandler.respondPropFind(AbstractWrappingResponseHandler.java:200)
    at org.dcache.webdav.AcceptAwareResponseHandler.respondPropFind(AcceptAwareResponseHandler.java:116)
    at io.milton.http.AbstractWrappingResponseHandler.respondPropFind(AbstractWrappingResponseHandler.java:200)
    at org.dcache.webdav.WorkaroundsResponseHandler.respondPropFind(WorkaroundsResponseHandler.java:138)
    at io.milton.http.AbstractWrappingResponseHandler.respondPropFind(AbstractWrappingResponseHandler.java:200)
    at io.milton.http.AbstractWrappingResponseHandler.respondPropFind(AbstractWrappingResponseHandler.java:200)
    at io.milton.http.webdav.PropFindHandler.processExistingResource(PropFindHandler.java:156)
    at io.milton.http.ResourceHandlerHelper.processResource(ResourceHandlerHelper.java:196)
    at io.milton.http.webdav.PropFindHandler.processResource(PropFindHandler.java:112)
    at io.milton.http.ResourceHandlerHelper.process(ResourceHandlerHelper.java:127)
    at org.dcache.webdav.DcacheResourceHandlerHelper.process(DcacheResourceHandlerHelper.java:42)
    at io.milton.http.webdav.PropFindHandler.process(PropFindHandler.java:106)
    at org.dcache.webdav.DcacheStandardFilter.process(DcacheStandardFilter.java:50)
    at io.milton.http.FilterChain.process(FilterChain.java:46)
    at org.dcache.webdav.transfer.CopyFilter.process(CopyFilter.java:276)
    at io.milton.http.FilterChain.process(FilterChain.java:46)
    at io.milton.http.HttpManager.process(HttpManager.java:158)
    at org.dcache.webdav.MiltonHandler.handle(MiltonHandler.java:77)
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.dcache.http.AuthenticationHandler.access$001(AuthenticationHandler.java:55)
    at org.dcache.http.AuthenticationHandler.lambda$handle$0(AuthenticationHandler.java:156)
    at java.base/java.security.AccessController.doPrivileged(Native Method)
    at java.base/javax.security.auth.Subject.doAs(Subject.java:361)
    at org.dcache.http.AuthenticationHandler.handle(AuthenticationHandler.java:153)
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.dcache.http.AbstractLoggingHandler.handle(AbstractLoggingHandler.java:110)
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.server.Server.handle(Server.java:516)
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:400)
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:645)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:392)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
    at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
    at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
    at java.base/java.lang.Thread.run(Thread.java:829)

When I did an ls in an NFS-mounted /pnfs/grid.sara.nl/data/swiftbackup/pbl/owncloud751, it was like a DoS of our PnfsManager. The transfer success rate (for everyone, not only this user) dropped to 20%. After I restarted the PnfsManager things got back to normal. I don't know how NFS interacts with the PnfsManager, but I bet it's not through your algorithm 😉

I have asked the user to tell me how many files are supposed to be in this dir. Perhaps on the client side he might be able to see this more easily.

The client is Rclone. I think Rclone wants to get a dir listing because it only uploads files that don't exist yet in the destination dir, or have changed since last upload.

DmitryLitvintsev commented 1 year ago

NFS by-passes PnfsManager altogether when running ls. You can check number of files by doing :

select inlink from t_inodes where ipnfsid = '<pnfsid of that directory>';

So, I think, is is fair to conclude that you have a problem with large directory that is:

P.S.: I was surprised that the listing is not just timing out on the door end. May be its hits OOM sooner!?

onnozweers commented 1 year ago
[root@db1 ~]# psql -U postgres chimera -c "select inlink from t_inodes where ipnfsid = '00005D8D033B902A42249F3A7510D1146E28'"
  inlink  
----------
 13961036
(1 row)

Wow, almost 14 million... Even more than I thought.

Thanks for looking into this. In short term it would be wonderful if the algorithm is configurable. For the future, I foresee more problems with such a huge and growing directory. We might need to set up a separate instance for them so they don't DoS other users. But that's a big operation that we can't do on short notice.

holties-astron commented 1 year ago

Returning from holiday, I became aware of this issue and ticket. To re-enforce Onno's point, I'd like to share that our case (LOFAR Long-Term Archive) is certainly affected by the new behaviour. LOFAR is significantly different from the FERMI case as I understand it.

For LOFAR

As this looks like spurious behaviour it was not identified until now, although we have seen a slight rise in unexplained operations that take much longer, or are failing, than we are used to for SURF. Looking into the logs for one of our services, it is nevertheless clear that starting July 1st the following error is being reported multiple times per day for listings at SURF: "List failed [rc=10006,msg=Timeout during directory listing.]"

From the thread I conclude that the issue is planned to be addressed by allowing the algorithm for directory listings to be made configurable. For LOFAR this fix will be very important and much appreciated.

DmitryLitvintsev commented 1 year ago

The problem will hit you when you used up all available threads when using "old algorithm", increasing the number of list threads will lead to database thrashing leading to timeouts. But yes, we will make the behavior of listing optional ASAP.

DmitryLitvintsev commented 1 year ago

Pull request has been made.

DmitryLitvintsev commented 1 year ago

@onnozweers are there subdirectories under /pnfs/grid.sara.nl/data/swiftbackup/pbl/owncloud751

Not directly related to issue, but we seem to be seeing long running listings by Webdav clients. We suspect that PROPFIND calls without Depth header result in recursive listing.

This is because even 15M entries directory just with files should have completed faster than 70 minutes.

P.S.: NVM, not recursive listing, but door receives your 15M entries and starts to loop over them sending PROPFIND request for each of them. This is where all the time is spent. I just got to the bottom of it.

onnozweers commented 1 year ago

We saw the patch in 8.2.28. Thanks a lot! I really appreciate the effort you put into this. We'll update soon.

About the PROPFIND, not sure what you mean. Do you mean that the door, while getting entries from the PnfsManager, also starts sending PROPFIND results? And that these slow down the listing operation in the PnfsManager? Anyway, let me know if you need my help. I'll be out of office for a few weeks though.

DmitryLitvintsev commented 1 year ago

PROPFIND calls done by WebDAV collect too much info (including file locality). As the result "ls -l" slows down very significantly Compare:

time  gfal-ls -l https://foo.fnal.gov:2880/pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20 | wc -l 
11937

real    3m0.176s
user    0m2.904s
sys 0m0.728s 

vs

time gfal-ls gsiftp://foo.fnal.gov/pnfs/fnal.gov/usr//minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20  |wc
  11937   11937  186268

real    0m1.607s
user    0m0.860s
sys 0m0.334s
paulmillar commented 1 year ago

I think it would be worth clarifying exactly what HTTP/WebDAV requests gfal-ls is making. For example, is the client making a single request for the directory?

Another important factor is how such a request is this handled within the Milton WebDAV library. Does Milton trigger a single request to PnfsManager, or are there multiple requests?

Just to mention it: file locality is a dCache-specific WebDAV property. It shouldn't be returned unless the client explicitly asks for it. It could be that the WebDAV door is asking for the necessary information (pool and tape locality) from the PnfsManager even if the data isn't going to be used. If so, then that is a bug and should be fixed.

alrossi commented 1 year ago

Hi @paulmillar . My first pass over the code suggests that these are always returned. I don't see options controlling whether to include locality. But maybe I'm not looking in the right place.

DmitryLitvintsev commented 1 year ago

@paulmillar To me it looks like that when a listing is requested for a directory the WebDAV door loops over initial result and calls PROPFIND on constituents. All file attributes gathered.

private static final Set<FileAttribute> PROPFIND_ATTRIBUTES = Sets.union(
          EnumSet.of(CHECKSUM, ACCESS_LATENCY, RETENTION_POLICY),
          PoolMonitorV5.getRequiredAttributesForFileLocality());

....

    private Set<FileAttribute> buildRequestedAttributes() {
        Set<FileAttribute> attributes = EnumSet.copyOf(REQUIRED_ATTRIBUTES);

        if (isDigestRequested()) {
            attributes.add(CHECKSUM);
        }

        if (isPropfindRequest()) {
            // FIXME: Unfortunately, Milton parses the request body after
            // requesting the Resource, so we cannot know which additional
            // attributes are being requested; therefore, we must request all
            // of them.
            attributes.addAll(PROPFIND_ATTRIBUTES);
        }

        return attributes;
    }
DmitryLitvintsev commented 1 year ago

(TBH, I already tried just hacking the above and dropping PROPFIND_ATTRIBUTES completely. I re-run ls -l and got about the same result. ) But now looks like Al is on it.

NB: gfal ls https:// and gfal ls -l https:// run about the same time.

DmitryLitvintsev commented 1 year ago

@paulmillar

grep 'webdavDomain:AAYCqqTulTg' /root/one_listing.log | grep -c "class org.dcache.vehicles.PnfsGetFileAttributes processed"
10002

The log above contains debug level output of this listing:

 gfal-ls -l https://foo.fnal.gov:2880/pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20 | wc -l 
11937

real    3m0.176s
user    0m2.904s
sys 0m0.728s 
DmitryLitvintsev commented 1 year ago

@paulmillar just to add a bit, pinboard of PnfsManager shows

....
28 Aug 2023 13:02:09 [PnfsManager-0] [door:WebDAV-fndcaitb4-1@webdavDomain:AAYD/3pVHLA WebDAV-fndcaitb4-1 PnfsGetFileAttributes] Using thread [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20/9885] 56
28 Aug 2023 13:02:09 [proc-56] [door:WebDAV-fndcaitb4-1@webdavDomain:AAYD/3pVHLA WebDAV-fndcaitb4-1 PnfsGetFileAttributes] class org.dcache.vehicles.PnfsGetFileAttributes processed in 3 ms
28 Aug 2023 13:02:09 [PnfsManager-0] [door:WebDAV-fndcaitb4-1@webdavDomain:AAYD/3pVHLA WebDAV-fndcaitb4-1 PnfsGetFileAttributes] Using thread [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20/9886] 55
28 Aug 2023 13:02:09 [proc-55] [door:WebDAV-fndcaitb4-1@webdavDomain:AAYD/3pVHLA WebDAV-fndcaitb4-1 PnfsGetFileAttributes] class org.dcache.vehicles.PnfsGetFileAttributes processed in 4 ms
28 Aug 2023 13:02:09 [PnfsManager-0] [door:WebDAV-fndcaitb4-1@webdavDomain:AAYD/3pVHLA WebDAV-fndcaitb4-1 PnfsGetFileAttributes] Using thread [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20/9887] 54
28 Aug 2023 13:02:09 [proc-54] [door:WebDAV-fndcaitb4-1@webdavDomain:AAYD/3pVHLA WebDAV-fndcaitb4-1 PnfsGetFileAttributes] class org.dcache.vehicles.PnfsGetFileAttributes processed in 3 ms
28 Aug 2023 13:02:09 [PnfsManager-0] [door:WebDAV-fndcaitb4-1@webdavDomain:AAYD/3pVHLA WebDAV-fndcaitb4-1 PnfsGetFileAttributes] Using thread [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20/9888] 53
28 Aug 2023 13:02:09 [proc-53] [door:WebDAV-fndcaitb4-1@webdavDomain:AAYD/3pVHLA WebDAV-fndcaitb4-1 PnfsGetFileAttributes] class org.dcache.vehicles.PnfsGetFileAttributes processed in 4 ms
28 Aug 2023 13:02:09 [PnfsManager-0] [door:WebDAV-fndcaitb4-1@webdavDomain:AAYD/3pVHLA WebDAV-fndcaitb4-1 PnfsGetFileAttributes] Using thread [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20/9889] 52
28 Aug 2023 13:02:09 [proc-52] [door:WebDAV-fndcaitb4-1@webdavDomain:AAYD/3pVHLA WebDAV-fndcaitb4-1 PnfsGetFileAttributes] class org.dcache.vehicles.PnfsGetFileAttributes processed in 3 ms
28 Aug 2023 13:02:09 [PnfsManager-0] [door:WebDAV-fndcaitb4-1@webdavDomain:AAYD/3pVHLA WebDAV-fndcaitb4-1 PnfsGetFileAttributes] Using thread [/pnfs/fnal.gov/usr/pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20/989] 50
28 Aug 2023 13:02:09 [proc-50] [door:WebDAV-fndcaitb4-1@webdavDomain:AAYD/3pVHLA WebDAV-fndcaitb4-1 PnfsGetFileAttributes] class org.dcache.vehicles.PnfsGetFileAttributes processed in 3 ms

It looks like a PnfsGetFileAttributes message sent by the door to PnfsManager. One message per entry in the directory (the pinboard snippet above correspond to:

 gfal-ls -l https://foo.fnal.gov:2880/pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20 | wc -l 
11937

real    3m0.176s
user    0m2.904s
sys 0m0.728s 

invocation)

paulmillar commented 1 year ago

So, DcacheDirectoryResource implements MakeCollectionableResource, which extends CollectionResource. The CollectionResource interface contains the method:

List<? extends Resource> getChildren()

For DcacheDirectoryResource, this does "the right thing": a single directory listing that creates Resource objects (either DcacheFileResource or DcacheDirectoryResource). However, something is triggering the subsequent PnfsManager requests.

My guess is that the problem is due to the attribute cache not being populated when the DcacheResource (DcacheFileResource or DcacheDirectoryResource) is created during the directory listing. (see DcacheResource#getAllPropertyNames for example)

alrossi commented 1 year ago

After placing a dumpTrace() for the thread that calls the PnfsHandler, I saw that it is in order to determine whether the directories have a WriteToken or not (a fetch of the StorageInfo attribute). Yes, this is because the relevant cache does not contain those entries (so the cache loader fetches them).

What I have done is fork the DcacheDirectoryResource into the standard one and one used specifically in list(), and for the latter not check for those space properties. That sped things up greatly (as well as paring down the attributes for list to simply the POSIX ones). The listing times (this posted by Dmitry) for webdav and xrdfs are now comparable:

[litvinse@host ~]$ time gfal-ls   root://example.org/big-directory   | wc -l 
11937

real    0m2.994s
user    0m0.953s
sys 0m0.382s

[litvinse@host ~]$ time gfal-ls   https://example.org:2880/big-directory   | wc -l 
11937

real    0m3.916s
user    0m1.651s
sys 0m0.356s
paulmillar commented 1 year ago

I'm not sure I follow.

If the problem (an extra PnfsManager round-trip per directory item) is triggered by a lack of information, why not update the list code to fetch that additional information as part of the directory listing?

I don't (yet) see why one would need to create an additional class (a fork of DcacheDirectoryResource) with different semantics.

alrossi commented 1 year ago

Why do we need that information for listing?

alrossi commented 1 year ago

There are two problems that we are dealing with. Even after eliminating the extra fetch, listing of files and directories being roughly the same, it still takes nearly two minutes to list only 1000 entries. This is because of all the extra properties (locality, storageinfo, etc.) which don't really show up in the POSIX listing output.

Dmitry and I have been trying to get the timing down to something reasonable (like what GridFTP does). I fixed it (basically) for xroot and am trying to do something similar for webdav.

But if you have another idea, perhaps you could describe it? Otherwise, we are still going to be burdened here at FNAL with listing that ends up being a DOS.

paulmillar commented 1 year ago

I'm guessing what's happening here is that the PropFind request is triggering querying the DAV:quota-available-bytes and/or DAV:quota-used-bytes properties. Either the client is specifying (in the PropFind request) that it wants these quota properties, or Milton/dCache is deciding to include either (or both) properties by default.

For directories, these properties trigger a PnfsManager lookup to see if the directory is in a token (i.e., whether the WriteToken tag is set) and uses the space-manager information to return quota information.

Two possible solutions:

  1. If the client isn't explicitly requesting these properties then update Milton / dCache so they are not included by default.
  2. Update the initial directly listing so it also includes the WriteToken information. The code then populates the WriteToken cache appropriately.
alrossi commented 1 year ago

Yes, that is what is happening (I just stated it differently above).

  1. Is the superior solution but I could not seem to get Milton to stop asking for it. But I will look again.
    In any event, I think we need to distinguish between a list request and a write request for a specific directory, where that info may be necessary. So some sort of fork in the code is necessary. But it may be I can indeed do this simply in the DcacheResourceFactory.
alrossi commented 1 year ago

Let me report the code which prompted me to subclass the resource:

    public Object getProperty(QName name) {
        Object value = super.getProperty(name);

        if (value != null) {
            return value;
        }

        try {
            if (name.equals(QUOTA_AVAILABLE)) {
                return _factory.spaceForPath(_path).getAvailableSpaceInBytes();
            }

            if (name.equals(QUOTA_USED)) {
                Space space = _factory.spaceForPath(_path);
                return space.getUsedSizeInBytes() + space.getAllocatedSpaceInBytes();
            }
        } catch (SpaceException e) {
            // this path has no space, treat as if property does not exist.
        }

        return null;
    }

    @Override
    public PropertyMetaData getPropertyMetaData(QName name) {
        PropertyMetaData metadata = super.getPropertyMetaData(name);

        if (!_factory.isSpaceManaged(_path)) {
            return metadata;
        }

        // Milton accepts null and PropertyMetaData.UNKNOWN to mean the
        // property is unknown.
        if ((metadata == null || metadata.isUnknown()) && QUOTA_PROPERTIES.contains(name)) {
            metadata = READONLY_LONG;
        }

        return metadata;
    }

    @Override
    public List<QName> getAllPropertyNames() {
        List<QName> genericNames = super.getAllPropertyNames();

        if (!_factory.isSpaceManaged(_path)) {
            return genericNames;
        }

        List<QName> names = new ArrayList<>(QUOTA_PROPERTIES.size() + genericNames.size());
        names.addAll(QUOTA_PROPERTIES);
        names.addAll(genericNames);
        return names;
    }

Even if we eliminate those properties so that getProperty() is not called with them, the other two methods have a forced check of the cache (isSpaceManaged).

onnozweers commented 1 year ago

Hi Dmitry,

I upgraded to 8.2.32 last Friday to revert to the old algorithm but it does not appear to work. Or did I do something wrong? 🤔

[namespaceDomain]
dcache.java.memory.direct=50g
dcache.java.memory.heap=50g
[namespaceDomain/pnfsmanager]
# pnfsmanager.limits.threads default is 12
pnfsmanager.limits.threads=24
#
# list-threads default 2
pnfsmanager.limits.list-threads = 30
#
# Fix dir listing queueing: https://github.com/dCache/dcache/issues/7252
pnfsmanager.enable.parallel-listing = true
#
# Chunk size default is 100, but that's way too small
# for dirs with 3 million files
pnfsmanager.limits.list-chunk-size = 10000
[root@db1 /etc/dcache]# dcache version
8.2.32
image
onnozweers commented 1 year ago

I understand this has become a long ticket and it's difficult to keep track of it. But could one of the dCache devs please have a look at my previous comment?

Cheers, Onno

DmitryLitvintsev commented 1 year ago

Hi Onno,

sorry, I have not seen your activity on this ticket. I am frankly stumped. The switch :

pnfsmanager.enable.parallel-listing = true

reverts to the previous code path without any changes.

During the queuing, can you provide output of

 \sn show list activity -times 

and

\sn info -a

It will be easy to see what the situation is. Is it still one ls being executed and a lot waiting or all 30 threads are executing. (in my testing I observed "old behavior" - I run many simutaneous listings of the same large directory and observed N threads processing the same directory)

Or can you find a sufficiently large directory (say 5K entries) and do

for  i in {1..10}
do
gfal-ls https://.....  > $i.log 2>&1& 
done

and run

ssh -p .... "\sn  show list activity -times "

(may run several times)

dCache as you configured it should show 10 running listings.

Inversely, the

pnfsmanager.enable.parallel-listing = false

would show 1 running, 9 queued.

(so no need to wait for "rush hour").

I have installed 8.2.32, and set

pnfsmanager.enable.parallel-listing = true
pnfsmanager.limits.list-threads = 24

I run script:

$ cat ls_test.sh
#!/bin/bash

for i in {1..100}; 
do 
    gfal-ls  https://fndcaitb4:2880/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20 > webdav_${i}.log 2>&1&  
done

This is what see:

$  ssh -p 24223 enstore@fndcaitb3 "\sn show list activity -times" 
QUEUED REQUESTS

            SOURCE              USER     ARRIVED                               STARTED                           PATH
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:09.115 (3 s ago)                                       /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
...
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:12.240 (842 ms ago)                                    /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20

ACTIVE REQUESTS

            SOURCE              USER     ARRIVED                               STARTED                           PATH
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:07.771 (5 s ago)     2023-10-10 15:03:07.785 (5 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:07.829 (5 s ago)     2023-10-10 15:03:07.831 (5 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:08.030 (5 s ago)     2023-10-10 15:03:08.031 (5 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:08.030 (5 s ago)     2023-10-10 15:03:08.032 (5 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:08.032 (5 s ago)     2023-10-10 15:03:08.034 (5 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:08.033 (5 s ago)     2023-10-10 15:03:08.034 (5 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:08.034 (5 s ago)     2023-10-10 15:03:08.036 (5 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:08.036 (5 s ago)     2023-10-10 15:03:08.037 (5 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:08.037 (5 s ago)     2023-10-10 15:03:08.038 (5 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:09.057 (4 s ago)     2023-10-10 15:03:09.062 (4 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:09.058 (4 s ago)     2023-10-10 15:03:09.062 (4 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:09.058 (4 s ago)     2023-10-10 15:03:09.063 (4 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:09.059 (4 s ago)     2023-10-10 15:03:09.063 (4 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:09.060 (4 s ago)     2023-10-10 15:03:09.063 (4 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:09.078 (4 s ago)     2023-10-10 15:03:09.080 (4 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:09.079 (4 s ago)     2023-10-10 15:03:09.120 (4 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:09.109 (4 s ago)     2023-10-10 15:03:09.124 (4 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:09.113 (4 s ago)     2023-10-10 15:03:09.135 (4 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:09.109 (4 s ago)     2023-10-10 15:03:09.125 (4 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:09.113 (4 s ago)     2023-10-10 15:03:09.135 (4 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:09.114 (4 s ago)     2023-10-10 15:03:09.136 (4 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:09.114 (4 s ago)     2023-10-10 15:03:09.136 (4 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:09.115 (4 s ago)     2023-10-10 15:03:09.136 (4 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:03:09.115 (4 s ago)     2023-10-10 15:03:09.136 (4 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20

24 threads busy listing the same directory, while anything in excess of 24 is QUEUED.

I have:

[litvinse@fnisd1 ~]$ wc -l  webdav_70.log
11937 webdav_70.log

11937 entries in that directory.

Now, I set:

pnfsmanager.enable.parallel-listing = false
pnfsmanager.limits.list-threads = 24

And re-run the same script. I see:

[litvinse@orbis ~]$  ssh -p 24223 enstore@fndcaitb3 "\sn show list activity -times" 
QUEUED REQUESTS

            SOURCE              USER     ARRIVED                           STARTED                           PATH
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:10:34.047 (5 s ago)                                   /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:10:34.097 (5 s ago)                                   /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:10:34.360 (5 s ago)                                   /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:10:34.360 (5 s ago)                                   /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20
...

ACTIVE REQUESTS

            SOURCE              USER     ARRIVED                           STARTED                           PATH
WebDAV-fndcaitb4-1@webdavDomain litvinse 2023-10-10 15:10:33.907 (6 s ago) 2023-10-10 15:10:33.918 (6 s ago) /pnfs/fnal.gov/usr/minerva/mc_generation/mc-cal-hepevent/mc/v10r8p9/00/01/12/20

So PnfsManager behaves like I would expect ("old style" vs "new style")

(The difference of "new style" is that all these QUEUED request completed immediately after the first one was done)

So can you do something like that so we can see what is going on. Don't need to go extreme with 100 listings. May be twice the number of listing threads.

Also: if you can add to that gratia plot number of queued and number of ACTIVE, that would also answer the question - do we have all threads busy or not. If all threads are busy - you will get queuing (this is what we used run into at Fermilab all the time and that's why this patch was made (parallel -> sequential for a given listing))

"new style" (pnfsmanager.enable.parallel-listing = false) all 100 listings completed under 1.5 minutes (and I could listings of other directories meanwhile).
"old style" (pnfsmanager.enable.parallel-listing = true) all 100 listings completed in 2 minutes, but no other listings could be made while all these were being processed using all available threads.

onnozweers commented 1 year ago

Hi Dmitry,

I'm as stumped as you are.

I created a dir in mijn dCache home dir with more than 10,000 small files, then ran gfal-ls as you described:

onno$ for session in {1..10} ; do { gfal-ls https://webdav.grid.surfsara.nl:2884/pnfs/grid.sara.nl/data/users/onno/disk/largedir | wc -l ; } & done

This takes ~6 seconds.

Meanwhile in the PnfsManager:

[root@install-fes ~]# dcache-admin-command PnfsManager 'show list activity -times'
QUEUED REQUESTS

              SOURCE                            USER               ARRIVED                              STARTED                              PATH
webdav2884-penguin14@webdav2884-penguin14Domain onno               2023-10-11 17:00:25.153 (621 ms ago)                                      /pnfs/grid.sara.nl/data/users/onno/disk/largedir            
webdav2884-penguin14@webdav2884-penguin14Domain onno               2023-10-11 17:00:25.162 (612 ms ago)                                      /pnfs/grid.sara.nl/data/users/onno/disk/largedir            
webdav2884-penguin14@webdav2884-penguin14Domain onno               2023-10-11 17:00:25.176 (598 ms ago)                                      /pnfs/grid.sara.nl/data/users/onno/disk/largedir            
webdav2884-penguin14@webdav2884-penguin14Domain onno               2023-10-11 17:00:25.176 (598 ms ago)                                      /pnfs/grid.sara.nl/data/users/onno/disk/largedir            
webdav2884-penguin14@webdav2884-penguin14Domain onno               2023-10-11 17:00:25.176 (598 ms ago)                                      /pnfs/grid.sara.nl/data/users/onno/disk/largedir            
webdav2884-penguin14@webdav2884-penguin14Domain onno               2023-10-11 17:00:25.179 (595 ms ago)                                      /pnfs/grid.sara.nl/data/users/onno/disk/largedir            
webdav2884-penguin14@webdav2884-penguin14Domain onno               2023-10-11 17:00:25.181 (593 ms ago)                                      /pnfs/grid.sara.nl/data/users/onno/disk/largedir            
webdav2884-penguin14@webdav2884-penguin14Domain onno               2023-10-11 17:00:25.185 (589 ms ago)                                      /pnfs/grid.sara.nl/data/users/onno/disk/largedir            
webdav2884-penguin14@webdav2884-penguin14Domain onno               2023-10-11 17:00:25.191 (583 ms ago)                                      /pnfs/grid.sara.nl/data/users/onno/disk/largedir            

ACTIVE REQUESTS

              SOURCE                            USER               ARRIVED                              STARTED                              PATH
webdav2884-penguin14@webdav2884-penguin14Domain onno               2023-10-11 17:00:25.151 (623 ms ago) 2023-10-11 17:00:25.151 (623 ms ago) /pnfs/grid.sara.nl/data/users/onno/disk/largedir            
 webdav2880-seacow14@webdav2880-seacow14Domain  return-mmilenkovic 2023-10-11 17:00:25.724 (50 ms ago)  2023-10-11 17:00:25.724 (50 ms ago)  /pnfs/grid.sara.nl/data/return/disk/S1_SA_UPSCALE/E069N075T3
 webdav2880-seacow12@webdav2880-seacow12Domain  return-mmilenkovic 2023-10-11 17:00:25.696 (78 ms ago)  2023-10-11 17:00:25.696 (78 ms ago)  /pnfs/grid.sara.nl/data/return/disk/S1_SA_UPSCALE/E069N072T3

This looks like the new algorithm to me... No idea why.

DmitryLitvintsev commented 1 year ago

Hi Onno,

Yes, It does exactly look like new algo. I hate when unexplained phenomena happen. Do you have a test system where we may at least turn on some debug may be?

(meanwhile I will investigate if logging can be set so that we can see what is passed to spring cell)

By all means looks like pnfsmanager.enable.parallel-listing = true is somehow ignored in your setup.

Does running dcache check-config on your pnfsmanager host say anything ? Do you have single PnfsManager or multiple?

How flexible are you to deploying on a test system a slight modified 8.2 where I could add an INFO message reporting the value of this variable after initialization (if all other ways to infer what was being passed have failed).