dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
274 stars 132 forks source link

Bulk Rest-API does not stage files with broken disk locations #7607

Open christianvoss opened 1 week ago

christianvoss commented 1 week ago

Hi,

we've been observing some curious behaviour with the bulk staging service. It appears the bulk service does not trigger stages, if there are disk locations known to dCache, even when these pools are offline. We've observed this recently, when a storage node had to be taken out of production for a week and we wanted to stage back some files needed by our users.

I've reproduced this also with the latest 9.2 dCache release: 9.2.21. What we see, when we want to stage a NEARLINE file with is:

{ "nextId": -1, "uid": "f9b987ee-02b5-4ba8-a334-df4b24ed4b6a", "arrivedAt": 1719238168621, "startedAt": 1719238168720, "lastModified": 1719238168753, "status": "COMPLETED", "targetPrefix": "/", "targets": [ { "target": "/pnfs/desy.de/exfel/archive/XFEL/raw/FXE/201802/p002271/r0081/RAW-R0081-LPD09-S00003.h5", "state": "SKIPPED", "submittedAt": 1719238168635, "startedAt": 1719238168635, "finishedAt": 1719238168750, "id": 242049 } ] }

The operation will always be skipped. But, dCache reports the file correctly as NEARLINE: "fileLocality": "NEARLINE",

In contrast, staging via SRM triggers a restore from tape immediately:

[vossc@naf-it01] [dev/vossc/no-macaroon-voms-directly] pnfs_qos_api $ srm-bring-online -lifetime=864000 srm://dcache-door-xfel01.desy.de:8443/pnfs/desy.de/exfel/archive/XFEL/raw/FXE/201802/p002271/r0081/RAW-R0081-LPD09-S00003.h5

[dcache-head-xfel02] (local) vossc > \sn pnfsidof /pnfs/desy.de/exfel/archive/XFEL/raw/FXE/201802/p002271/r0081/RAW-R0081-LPD09-S00003.h5 00005283EB13A8A943E9938C32E0BFFF47FC

[dcache-head-xfel02] (local) vossc > \sp rc ls 00005283EB13A8A943E9938C32E0BFFF47FC@world-net-/ m=1 r=0 [dcache-xfel499-01] [Waiting for stage: dcache-xfel499-01 06.24 16:10:40] {0,}

[dcache-head-xfel02] (local) vossc > \s dcache-xfel499-01 rh ls a928e3c3-6454-4151-b186-0f3ab7b93757 ACTIVE Mon Jun 24 16:10:40 CEST 2024 Mon Jun 24 16:10:40 CEST 2024 00005283EB13A8A943E9938C32E0BFFF47FC xfel:FXE-2018

Is it possible for bulk to behave like SRM did in the past, or would the procedure be to 'disable' the location in chimera before triggering the stage?

Thanks a lot, Christian

DmitryLitvintsev commented 1 week ago

yes. Bulk purely relies on location information in chimera.