dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
277 stars 132 forks source link

TAPE REST API and non-default dCache webdav.root #7506

Open ageorget opened 5 months ago

ageorget commented 5 months ago

Hi,

Following the discussion with Petr, I open this issue about the usage of the Tape Rest API with a Webdav door configured with a relative webdav.root which fails with 9.2.6.

gfal-stat with the relative path :

gfal-stat davs://ccdavatlas.in2p3.fr:2880/atlasdatatape/SAM/1M                                                                    
  File: 'davs://ccdavatlas.in2p3.fr:2880/atlasdatatape/SAM/1M'
  Size: 1048576 regular file
Access: (0777/-rwxrwxrwx)   Uid: 0  Gid: 0  
Access: 1970-01-01 01:00:00.000000
Modify: 2024-02-01 15:31:33.000000
Change: 2024-02-01 15:31:32.000000

The API response using the relative path :

gfal-xattr davs://ccdavatlas.in2p3.fr:2880/atlasdatatape/SAM/1M user.status                                                                                                                       
gfal-xattr error: 42 (No message of desired type) - [Tape REST API] No such file or directory /atlasdatatape/SAM/1M

and using the full path :

gfal-xattr davs://ccdavatlas.in2p3.fr:2880/pnfs/in2p3.fr/data/atlas/atlasdatatape/SAM/1M user.status                                                                                              
ONLINE_AND_NEARLINE

The Webdav configuration has the webdav.root set up : webdav.root=/pnfs/in2p3.fr/data/atlas/

/var/lib/dcache/httpd/wlcg-tape-rest-api.json

{
  "sitename": "IN2P3-CC",
  "description": "This is the dCache WLCG TAPE REST API endpoint for IN2P3-CC",
  "endpoints":[
      {
        "uri":"https://ccdcamcli06.in2p3.fr:3880/api/v1/tape",
        "version":"v1",
        "metadata": {
        }
      } ]
}

Only the webdav door is configured to use relative path (VO specific). The rest of the dCache conf is using default / root.

/etc/grid-security/storage-authzdb

version 2.1
authorize atlagrid read-write 3327 124 / / /
authorize cmsgrid read-write 3033 119 / / /
authorize lhcbgrid read-write 3437 155 / / /

Frontend logs : Feb 05 10:02:27 ccdcamcli06 dcache@FrontendDomain[138833]: 05 Feb 2024 10:02:27 (frontend) [] getInfo failed for /atlasdatatape/SAM/1M: No such file or directory /atlasdatatape/SAM/1M.

Adrien

onnozweers commented 5 months ago

Hi Adrien,

Did you set frontend.root to the same value as webdav.root?

Kind regards, Onno

ageorget commented 5 months ago

Hi Onno, No, the frontend is generalist and not dedicated to a VO like the Webdav doors. But I tried to add frontend.root to the same value as webdav.root (and restart) for testing but it does not change anything for this test (but dCache-view displays correct relative path).

Cheers, Adrien

DmitryLitvintsev commented 5 months ago

Hi Adrien,

I think I can conclude that WebDAV supports relative path, but REST API does not. Correct? While we are at it, can you check if namespace resource of frontend suffer from the same issue?

Dmitry

DmitryLitvintsev commented 5 months ago

One thing I suggest to try. Change storage-authzdb so it looks like:

authorize atlagrid read-write 3327 124 / /pnfs/in2p3.fr/data/atlas /
authorize cmsgrid read-write 3033 119 / /pnfs/in2p3.fr/data/cms /
authorize lhcbgrid read-write 3437 155 / /pnfs/in2p3.fr/data/lhcb /

Then set webdav.root=/ frontend.root=/ . Restart doors and retry.

ageorget commented 5 months ago

Hi Dmitry, Well it cannot be done so easily on our side because some protocols like Webdav use basepath and some don't, like SRM and XRootD. So it may needs some changes in CRIC at the same time of changing the dCache conf and this will require a site downtime I guess.

The Webdav door should be able to contact the REST API with the path resolution no? BTW, the 9.2.0 release contains "Path resolution (for relative paths) has been integrated into bulk request processing.". Is it for a different usecase?

DmitryLitvintsev commented 5 months ago

Yes. we were under impression that this was been properly handled. I am just suggesting a mitigation.

I believe that in path defined in storage-authzdb XRootD and WebDAV will work fine (with both full path and relative path). FTP will take only relative path. SRM will take both.

Meanwhile I will be getting to the bottom of your issue. Need to reproduce it on test system.

DmitryLitvintsev commented 5 months ago

But yes, you are right. Since this issue concerns REST API only and this is sort of "experimental" feature, it is safer to wait for proper resolution.

ageorget commented 5 months ago

I agree with that. We just started to test the REST API so we can wait.

And PIC looks also concerned by the same issue according to Petr

gfal-ls https://webdav-at1.pic.es:8466/atlasdatatape/SAM/testfile-put-ATLASDATATAPE-1307712801-484ec9b9cdd6.txt
https://webdav-at1.pic.es:8466/atlasdatatape/SAM/testfile-put-ATLASDATATAPE-1307712801-484ec9b9cdd6.txt

gfal-xattr https://webdav-at1.pic.es:8466/atlasdatatape/SAM/testfile-put-ATLASDATATAPE-1307712801-484ec9b9cdd6.txt user.status
gfal-xattr error: 42 (No message of desired type) - [Tape REST API] No such file or directory /atlasdatatape/SAM/testfile-put-ATLASDATATAPE-1307712801-484ec9b9cdd6.txt
DmitryLitvintsev commented 5 months ago

I think this is a bug that affects everybody who use xxx.root property. NB at Fermilab we don't we prefer to have it defined in /etc/grid-security/storage-authzdb. We will investigate. This is a bit unfortunate because Al definitely tried to address it back in April.

kofemann commented 5 months ago

Actually I am completry confused. Why gfal-xattr uses the TAPE-API endpoint ?! And, yeah, I can confirm the behaviour:


# layout file
webdav.root=/public
frontend.root=${webdav.root}

# 🤷🏼‍♂️ 
$ gfal-xattr http://192.168.178.40:2880/
taperestapi.version = v1
taperestapi.uri = http://192.168.178.40:3880/api/v1/tape
taperestapi.sitename = dcache-systest

# list by prefix ✅ 
$ gfal-ls http://192.168.178.40:2880/
file.txt

# xattr by prefix ⛔ 
$ gfal-xattr http://192.168.178.40:2880/file.txt user.status
gfal-xattr error: 42 (No message of desired type) - [Tape REST API] No such file or directory /file.txt

# xattr by full path ✅ 🤔 
$ gfal-xattr http://192.168.178.40:2880/public/file.txt user.status
NEARLINE
DmitryLitvintsev commented 5 months ago

I think gfal-xattr calls archiveinfo of REST API

While you at it - can you check if any operations in "our API" , say namespace fail with relative path.

If this the case the issue is limited to frontend/REST and hopefully not deeper in bulk.

Conversely if namespace API works this narrows the parameter space.

I think the problem can be recast like - FrontEnd does not support "frontend.root" variable that looks like needs to match "webdav.root" variable.

I can try to look at it tomorrow as well.

kofemann commented 5 months ago

It looks like that in REST requests (tape, bulk, qos?) where path is provided as payload dcache doesn't resolve relative to the configured prefix. We use user provided path as-is. The conversion should happen in both directions: is the request and in the reply:

      for (int i = 0; i < len; ++i) {
          String requestedPath = jsonArray.getString(i);
          String dcachePath = rootPath.chroot(requestedPath).toString();
          paths.add(dcachePath);
      }

...

  return out.stream().map(ai -> {
      var a = new ArchiveInfo();
      a.setError(ai.getError());
      a.setLocality(ai.getLocality());
      a.setPath(FsPath.create(ai.getPath()).stripPrefix(rootPath));
      return a;
  }).toList();

The proposed changes addresses the issue:

$ gfal-xattr http://192.168.178.40:2880/file.txt user.status
NEARLINE

I will check is it possible to make it mo generic and update bulk/qos as well.

kofemann commented 4 months ago

see: https://rb.dcache.org/r/14222/

ageorget commented 3 months ago

Hi @kofemann Do we need to upgrade bulk service and the webdav doors or only the bulk service to get this fix in 9.2.14?

kofemann commented 3 months ago

Only frontend service

ageorget commented 3 months ago

Hi @kofemann I upgraded our dCache core service to 9.2.14 yesterday but gfal-xattr command still returns [Tape REST API] No such file or directory with prefix :

webdav.root=/pnfs/in2p3.fr/data/atlas/

$ gfal-xattr davs://ccdavatlas.in2p3.fr:2880/pnfs/in2p3.fr/data/atlas/atlasdatatape/SAM/1M user.status
NEARLINE
$ gfal-xattr davs://ccdavatlas.in2p3.fr:2880/atlasdatatape/SAM/1M user.status                         
gfal-xattr error: 42 (No message of desired type) - [Tape REST API] No such file or directory /atlasdatatape/SAM/1M

Should I also set frontend.root=${webdav.root} in the webdav door layout to make it works, like in your test?

Adrien

kofemann commented 3 months ago

Hi @ageorget , yes, frontend and webdav config should match, if you want to have consistent behaviour. Thus frontend.root=${webdav.root} should be set.

ageorget commented 3 months ago

Hi @kofemann For now I only have one global frontend and dedicated Webdav doors for VOs with proper prefix (/atlas, /cms, /lhcb). So that means I should now configure one frontend for each VO?

kofemann commented 3 months ago

Do you have a webdav door per experiment?

-kofemann /* caffeinated mutations of the core personality /

On Fri, Mar 15, 2024 at 10:49 AM ageorget @.***> wrote:

Hi @kofemann https://github.com/kofemann For now I only have one global frontend and dedicated Webdav doors for VOs with proper prefix (/atlas, /cms, /lhcb). So that means I should now configure one frontend for each VO?

— Reply to this email directly, view it on GitHub https://github.com/dCache/dcache/issues/7506#issuecomment-1999296336, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEMTXNKNTBHS2JCUFXZ2P3YYK73DAVCNFSM6AAAAABCZYNMECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJZGI4TMMZTGY . You are receiving this because you were mentioned.Message ID: @.***>

ageorget commented 3 months ago

Yes a webdav door per experiment (with webdav.root configured) and one generalist for wlcg/dteam

kofemann commented 3 months ago

Yes, you will need something similar for frontend, or we need to understand how you can do it with a single frontend, as then you must tight webdav doors to frontends

-kofemann /* caffeinated mutations of the core personality /

On Fri, Mar 15, 2024 at 12:33 PM ageorget @.***> wrote:

Yes a webdav door per experiment (with webdav.root configured) and one generalist for wlcg/dteam

— Reply to this email directly, view it on GitHub https://github.com/dCache/dcache/issues/7506#issuecomment-1999468154, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEMTXJI7W3V4AOIC2CO6OLYYLMA3AVCNFSM6AAAAABCZYNMECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJZGQ3DQMJVGQ . You are receiving this because you were mentioned.Message ID: @.***>

ageorget commented 2 months ago

Hi, Any news about this issue?

ageorget commented 2 months ago

Hi @kofemann @DmitryLitvintsev

I set up a dedicated frontend service for CMS matching frontend.root=${webdav.root} but I'm facing the same issue. All requests are failing with No such file or directory because the prefix is not used by the bulk service

(ERROR: diskCacheV111.util.FileNotFoundCacheException : CacheException(rc=10001;msg=No such file or directory /data/store/test/rucio/store/test/loadtest/source/T1_FR_CCIN2P3_Tape_Test/urandom.270MB.file0000))

The webdav conf :

[webdav-ccdcacli537Domain]
[webdav-ccdcacli537Domain/webdav]
webdav.root=/pnfs/in2p3.fr/data/cms/
webdav.authn.protocol=https

wlcg-tape-rest-api.json targets the dedicated frontend with :

[Frontend2Domain]
[Frontend2Domain/frontend]
frontend.authn.basic=true
frontend.authn.protocol=https
frontend.root=/pnfs/in2p3.fr/data/cms/

gfal-xattr is OK :

gfal-xattr -vv davs://ccdavcms.in2p3.fr:2880/data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root user.status 

INFO     Davix: > GET /.well-known/wlcg-tape-rest-api HTTP/1.1
> User-Agent: gfal2-util/1.8.1 gfal2/2.22.2 neon/0.0.29
> Keep-Alive: 
> Connection: Keep-Alive
> TE: trailers
> Host: ccdavcms.in2p3.fr:2880
> 

INFO     Davix: < HTTP/1.1 200 OK
INFO     Davix: < Date: Mon, 29 Apr 2024 08:48:31 GMT
INFO     Davix: < Server: dCache/9.2.14
INFO     Davix: < Content-Type: application/json;charset=utf-8
INFO     Davix: < Transfer-Encoding: chunked
INFO     Davix: < 
INFO     Davix: < 
INFO     Davix: > POST /api/v1/tape/archiveinfo HTTP/1.1
> User-Agent: gfal2-util/1.8.1 gfal2/2.22.2 neon/0.0.29
> Keep-Alive: 
> Connection: Keep-Alive
> TE: trailers
> Host: ccdcamcli07.in2p3.fr:3880
> Content-Type: application/json
> Content-Length: 163
> 

INFO     Davix: < HTTP/1.1 200 OK
INFO     Davix: < Date: Mon, 29 Apr 2024 08:48:31 GMT
INFO     Davix: < Server: dCache/9.2.14
INFO     Davix: < Content-Type: application/json
INFO     Davix: < Content-Length: 179
INFO     Davix: < 
NEARLINE

But if I try to stage the file it fails :

cat stage2.json 
{
"files": [
{"path": "/data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root","diskLifetime":"PT1H"}
]}

curl -v --capath /etc/grid-security/certificates --cacert $X509_USER_PROXY --cert $X509_USER_PROXY -X POST "https://ccdcamcli07.in2p3.fr:3880/api/v1/tape/stage" -H  "accept: application/json" -H  "content-type: application/json" -d @stage2.json       
* Connected to ccdcamcli07.in2p3.fr (2001:660:5009:1:134:158:109:247) port 3880 (#0)
> POST /api/v1/tape/stage HTTP/1.1
> User-Agent: curl/7.29.0
> Host: ccdcamcli07.in2p3.fr:3880
> accept: application/json
> content-type: application/json
> Content-Length: 195
> 
* upload completely sent off: 195 out of 195 bytes
< HTTP/1.1 201 Created
< Date: Mon, 29 Apr 2024 08:50:40 GMT
< Server: dCache/9.2.14
< Location: https://ccdcamcli07.in2p3.fr:3880/api/v1/tape/stage/9c4abd96-087b-477c-b67a-62c8fb467cd5
< Content-Type: application/json
< Content-Length: 58
< 
{
  "requestId" : "9c4abd96-087b-477c-b67a-62c8fb467cd5"
* Connection #0 to host ccdcamcli07.in2p3.fr left intact
}

curl --capath /etc/grid-security/certificates --cacert $X509_USER_PROXY --cert $X509_USER_PROXY -X GET "https://ccdcamcli07.in2p3.fr:3880/api/v1/tape/stage/9c4abd96-087b-477c-b67a-62c8fb467cd5" -H  "accept: application/json" 
{
  "id" : "9c4abd96-087b-477c-b67a-62c8fb467cd5",
  "createdAt" : 1714380640973,
  "startedAt" : 1714380641016,
  "completedAt" : 1714380641071,
  "files" : [ {
    "path" : "/data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root",
    "finishedAt" : 1714380640979,
    "startedAt" : 1714380640979,
    "error" : "CacheException(rc=10001;msg=No such file or directory /data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root)",
    "state" : "FAILED"
  } ]
}%

In the bulk log : Apr 29 10:50:41 ccdcamcli06 dcache@bulkDomain[49218]: 29 Apr 2024 10:50:41 (bulk) [] 9c4abd96-087b-477c-b67a-62c8fb467cd5 - fetchAttributes, callback failure for TARGET [241299, INITIAL, null][null][CREATED: (C 2024-04-29 10:50:40.979)(S null)(U 2024-04-29 10:50:40.979)(ret 0)][null] null : /data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root (err null null).

(bulk@bulkDomain) ageorget > request info 9c4abd96-087b-477c-b67a-62c8fb467cd5
9c4abd96-087b-477c-b67a-62c8fb467cd5:
status:           COMPLETED
arrived at:       2024-04-29 10:50:40.973
started at:       2024-04-29 10:50:41.013
last modified at: 2024-04-29 10:50:41.07
target prefix:    /
targets:
CREATED                   |                   STARTED |                 COMPLETED |        STATE | TARGET
2024-04-29 10:50:40.979   |   2024-04-29 10:50:40.979 |   2024-04-29 10:50:40.979 |       FAILED | /data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root -- (ERROR: diskCacheV111.util.FileNotFoundCacheException : CacheException(rc=10001;msg=No such file or directory /data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root))
kofemann commented 2 months ago

Hi, @ageorget. Could you please add the full path name to the issue so, that we can see to which path it should be resolved to?

ageorget commented 2 months ago

The full path for the last example is -rw-r--r-- 1 cmsgrid cmsf 4.0G Apr 26 15:11 /pnfs/in2p3.fr/data/cms/data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root

So prefix + path: /pnfs/in2p3.fr/data/cms + /data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root

stlammel commented 1 month ago

Can someone from the dCache team provide an update where things stand? (CMS needs this fixed to complete the SRMv2 phase out.) Thanks,

DmitryLitvintsev commented 1 month ago

on CMS T1 system: @ageorget @kofemann @stlammel

It works for us. On CMS T1 tape:

frontend.root = /pnfs/fs/usr/cms
webdav.root = /pnfs/fs/usr/cms

storage-authzdb:

authorize cmsprod read-write 9811 5063,9114,9247 / /pnfs/fs/usr/cms /pnfs/fs/usr/cms

check archiveinfo:

$ curl  --capath /etc/grid-security/certificates --cert /tmp/x509up_u`id -u` --cacert /tmp/x509up_u`id -u` --key  /tmp/x509up_u`id -u` -X POST https://cmsdcatape.fnal.gov:3880/api/v1/tape/archiveinfo -H "Content-Type: application/json" -d '{"paths"  : ["/WAX/11/store/test/rucio/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCal\MaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root"]}'

reply :

[{"path":"/WAX/11/store/test/rucio/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root","locality":"TAPE"}]

bring it online:

$ gfal-bringonline https://cmsdcatape.fnal.gov:3880/WAX/11/store/test/rucio/store/express/Run2023C/Strea\
mALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root

reply:

https://cmsdcatape.fnal.gov:3880/WAX/11/store/test/rucio/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root QUEUED

in bulk I see:

[cmsdcatape02new] (bulk@bulkDomain) admin > request ls 
ID           | ARRIVED             |            MODIFIED |        OWNER |     STATUS | UID
6240         | 2024/05/24-15:39:13 | 2024/05/24-15:39:14 |    9811:5063 |    STARTED | 31c38979-d1f2-4ac2-883c-0b00458f206d
[cmsdcatape02new] (bulk@bulkDomain) admin > request info 31c38979-d1f2-4ac2-883c-0b00458f206d
31c38979-d1f2-4ac2-883c-0b00458f206d:
status:           STARTED
arrived at:       2024-05-24 15:39:13.697
started at:       2024-05-24 15:39:14.336
last modified at: 2024-05-24 15:39:14.336
target prefix:    /pnfs/fs/usr/cms
targets:
CREATED                   |                   STARTED |                 COMPLETED |        STATE | TARGET
2024-05-24 15:39:13.92    |    2024-05-24 15:39:13.92 |                         ? |      RUNNING | /WAX/11/store/test/rucio\
/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-\
72ec4c6da394.root

and after while

[cmsdcatape02new] (bulk@bulkDomain) admin > request info 31c38979-d1f2-4ac2-883c-0b00458f206d
31c38979-d1f2-4ac2-883c-0b00458f206d:
status:           COMPLETED
arrived at:       2024-05-24 15:39:13.697
started at:       2024-05-24 15:39:14.336
last modified at: 2024-05-24 15:42:18.336
target prefix:    /pnfs/fs/usr/cms
targets:
CREATED                   |                   STARTED |                 COMPLETED |        STATE | TARGET
2024-05-24 15:39:13.92    |    2024-05-24 15:39:13.92 |   2024-05-24 15:42:18.324 |    COMPLETED | /WAX/11/store/test/rucio/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root

And checking archiveinfo again:

$ curl  --capath /etc/grid-security/certificates --cert /tmp/x509up_u`id -u` --cacert /tmp/x509up_u`id -u` --key  /tmp/x509up_u`id -u` -X POST https://cmsdcatape.fnal.gov:3880/api/v1/tape/archiveinfo -H "Content-Type: application/json" -d '{"paths"  : ["/WAX/11/store/test/rucio/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root"]}'

reply:

[{"path":"/WAX/11/store/test/rucio/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root","locality":"DISK_AND_TAPE"}]

As you can see works as designed for us. The real full path of the file is:

# ls -al /pnfs/fs/usr/cms/WAX/11/store/test/rucio/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root
-rw-r--r-- 1 9811 5063 43073478 Feb 27 04:12 /pnfs/fs/usr/cms/WAX/11/store/test/rucio/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root

@ageorget the frontend.root in your case is real file path or symlink? (just fishing for possible differences)

Dmitry

stlammel commented 1 month ago

Thanks Dmitry! @DmitryLitvintsev Let's try to track down the Fermilab/In2P3 difference and see if we can overcome this.

ageorget commented 1 month ago

Hi @DmitryLitvintsev

The frontend.root is a real file path, not a symlink. The only difference I see in our dCache conf is the storage-authzdb which is set to the root path :

authorize atlagrid read-write 3327 124 / / /
authorize cmsgrid read-write 3033 119 / / /

If we need to set basepath in the storage-authzdb, we will have to coordinates the changes with CRIC because as I said, currently some protocols like Webdav use basepath and some don't, like SRM and XRootD (redirector, local access).

Same tests you mentioned using the Atlas frontend (9.2.14 withfrontend.root=/pnfs/in2p3.fr/data/atlas ) :

check archiveinfo OK :

curl  --capath /etc/grid-security/certificates --cert /tmp/x509up_u`id -u` --cacert /tmp/x509up_u`id -u` --key  /tmp/x509up_u`id -u` -X POST https://ccdcamcli08.in2p3.fr:3880/api/v1/tape/archiveinfo -H "Content-Type: application/json" -d '{"paths"  : ["/atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1"]}' 
[{"path":"/atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1","locality":"TAPE"}]

bring it online OK :

gfal-bringonline https://ccdcamcli08.in2p3.fr:3880/atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1
https://ccdcamcli08.in2p3.fr:3880/atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1 QUEUED

in bulk, request failed :

request info 776cec02-be3b-4222-98e9-1cb7d000ea15
776cec02-be3b-4222-98e9-1cb7d000ea15:
status:           COMPLETED
arrived at:       2024-05-27 13:59:32.691
started at:       2024-05-27 13:59:32.705
last modified at: 2024-05-27 13:59:32.724
target prefix:    /
targets:
CREATED                   |                   STARTED |                 COMPLETED |        STATE | TARGET
2024-05-27 13:59:32.695   |   2024-05-27 13:59:32.695 |   2024-05-27 13:59:32.695 |       FAILED | /atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1 -- (ERROR: diskCacheV111.util.FileNotFoundCacheException : CacheException(rc=10001;msg=No such file or directory /atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1))
ageorget commented 1 month ago

Well, I tried to change storage-authzdb as @DmitryLitvintsev suggested adding the basepath :

authorize atlagrid read-write 3327 124 / /pnfs/in2p3.fr/data/atlas /
authorize cmsgrid read-write 3033 119 / /pnfs/in2p3.fr/data/cms /

And this was enough to make staging work using prefix :

request info 6986f840-2cb9-4bfa-87c8-b416643de233
6986f840-2cb9-4bfa-87c8-b416643de233:
status:           COMPLETED
arrived at:       2024-05-30 10:26:53.244
started at:       2024-05-30 10:26:53.261
last modified at: 2024-05-30 10:32:16.596
target prefix:    /pnfs/in2p3.fr/data/cms
targets:
CREATED                   |                   STARTED |                 COMPLETED |        STATE | TARGET
2024-05-30 10:26:53.249   |   2024-05-30 10:26:53.249 |   2024-05-30 10:32:16.586 |    COMPLETED | /data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root
request info 417a5959-c3ee-43e8-aafc-d7b5a3f3e5af
417a5959-c3ee-43e8-aafc-d7b5a3f3e5af:
status:           COMPLETED
arrived at:       2024-05-30 11:42:14.699
started at:       2024-05-30 11:42:14.712
last modified at: 2024-05-30 11:48:39.733
target prefix:    /pnfs/in2p3.fr/data/atlas
targets:
CREATED                   |                   STARTED |                 COMPLETED |        STATE | TARGET
2024-05-30 11:42:14.703   |   2024-05-30 11:42:14.703 |   2024-05-30 11:48:39.729 |    COMPLETED | /atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1

But now I see transfers errors with /upload which moves from / to /pnfs/in2p3.fr/data/atlas I think this change should be coordinate with generalist SRM+HTTPS Webdav doors which used webdav.root=/

So using prefix set in storage-authzdb allowed bulk service to get target prefix: /pnfs/in2p3.fr/data/atlas configured. Is there a way to make it compatible with frontend.root parameter instead of being forced to change the main dCache configuration?

elenamplanas commented 3 weeks ago

Hi, we're facing a similar problem, using relative path. This week we've upgraded our dcache instance to 9.2.20 with the idea to solve the problem, but it's still failing

At PIC we have an alias for webdav-at1-tape.pic.es that points to door01.pic.es and door02.pic.es

$ host [webdav-at1-tape.pic.es](http://webdav-at1-tape.pic.es/)
[webdav-at1-tape.pic.es](http://webdav-at1-tape.pic.es/) has address 193.109.172.132
[webdav-at1-tape.pic.es](http://webdav-at1-tape.pic.es/) has address 193.109.172.130
[webdav-at1-tape.pic.es](http://webdav-at1-tape.pic.es/) has IPv6 address 2001:67c:1148:201::12
[webdav-at1-tape.pic.es](http://webdav-at1-tape.pic.es/) has IPv6 address 2001:67c:1148:201::11

On both doors we have this configuration:

############################################
# Domain: webdav-at1-tape-https-door01Domain
[webdav-at1-tape-https-${[host.name](http://host.name/)}Domain]
dcache.java.memory.heap=512m
dcache.java.options.extra=-Dcom.sun.management.jmxremote.port=7052 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
dcache.wellknown!wlcg-tape-rest-api.path=${dcache.paths.httpd}/wlcg-tape-rest-api-ATLAS-TAPE.json
webdav.mover.queue=webdav
webdav.net.port=8466
# Cell: frontend
[webdav-at1-tape-https-${[host.name](http://host.name/)}Domain/frontend]
frontend.authn.accept-client-cert=true
frontend.authn.protocol=https
[frontend.cell.name](http://frontend.cell.name/)=frontend
frontend.net.port=8486
frontend.root=/pnfs/[pic.es/data/atlas/tape](http://pic.es/data/atlas/tape)
frontend.static!dcache-view.endpoints.webdav=https://webdav-at1.pic.es:8466/
frontend.static!dcache-view.oidc-authz-endpoint-extra=-
frontend.static!dcache-view.oidc-authz-endpoint-list=https://idp.pic.es/realms/PIC/protocol/openid-connect/auth
frontend.static!dcache-view.oidc-client-id-list=dcache-view
frontend.static!dcache-view.oidc-provider-name-list=PIC
frontend.static!dcache-view.org-name=[pic.es](http://pic.es/)
# Cell: webdav
[webdav-at1-tape-https-${[host.name](http://host.name/)}Domain/webdav]
webdav.allowed.client.origins=https://webdav-at1.pic.es:8486/
webdav.authn.protocol=https
webdav.authz.allowed-paths=/pnfs/[pic.es/data/atlas/tape](http://pic.es/data/atlas/tape)
[webdav.cell.name](http://webdav.cell.name/)=WebDAV-ATLAST1-TAPE-${[host.name](http://host.name/)}
webdav.enable.overwrite=true
webdav.loginbroker.tags=cdmi,dcache-view,glue,storage-descriptor,srmatlas
webdav.redirect.allow-https=true
webdav.redirect.on-read=true
webdav.redirect.on-write=true
webdav.root=/pnfs/[pic.es/data/atlas/tape](http://pic.es/data/atlas/tape)
## End webdav-at1-tape-https-door01Domain
############################################

As you can see the root path defined is the same for the webdav and the frontend On the other side, the dcache.wellknown!wlcg-tape-rest-api.path in each host has the hostname, not the alias.

[root@door01 layouts]# cat /var/lib/dcache/httpd/wlcg-tape-rest-api-ATLAS-TAPE.json
{
  "sitename": "PIC",
  "description": "This is the dCache WLCG TAPE REST API endpoint for ATLAS",
  "endpoints":[
      {
        "uri":"https://door01.pic.es:8486/api/v1/tape",
        "version":"v1",
        "metadata": {
        }
      } ]
}

I don't know how gfal-bringonline internally works, and if it redirects the request to the uri defined in the wellknown file. I could try to put the alias instead of the hostname, but Petr has done tests accessing directly to the hostname with same results.

In our storage-authdb file the entry for ATLAS users is

authorize atlas001   read-write 31051 1307  / / /
authorize atprd001   read-write 42001 1307  / / /

See below tests done by Petr

gfal-bringonline call correctly TAPE REST API, manually called individual HTTP request (with right short path) looks like

$ curl -s --capath /etc/grid-security/certificates --cert /tmp/x509up_u8021 --key /tmp/x509up_u8021 --cacert /tmp/x509up_u8021 -X POST -H 'Content-TYpe: aplication/json' -d '{ "files":[{ "path": "/atlasmctape/mc16_13TeV/EVNT/e8388_e7400/mc16_13TeV.601226.PhPy8_A14_NNPDF31_ttbb_4FS_bzd5_dilep.merge.EVNT.e8388_e7400_tid27618965_00/EVNT.27618965._000691.pool.root.1" }] }' https://door01.pic.es:8486/api/v1/tape/stage
{
  "requestId" : "e50d5cc6-8c66-4200-94fe-d001fea8ece7"
}

$ curl -s --capath /etc/grid-security/certificates --cert /tmp/x509up_u8021 --key /tmp/x509up_u8021 --cacert /tmp/x509up_u8021 -XGET -H 'Content-TYpe: application/json' https://door01.pic.es:8486/api/v1/tape/stage/e50d5cc6-8c66-4200-94fe-d001fea8ece7
{
  "id" : "e50d5cc6-8c66-4200-94fe-d001fea8ece7",
  "createdAt" : 1718316934373,
  "startedAt" : 1718316934378,
  "completedAt" : 1718316934387,
  "files" : [ {
    "path" : "/atlasmctape/mc16_13TeV/EVNT/e8388_e7400/mc16_13TeV.601226.PhPy8_A14_NNPDF31_ttbb_4FS_bzd5_dilep.merge.EVNT.e8388_e7400_tid27618965_00/EVNT.27618965._000691.pool.root.1",
    "finishedAt" : 1718316934375,
    "startedAt" : 1718316934375,
    "error" : "CacheException(rc=10001;msg=No such file or directory /atlasmctape/mc16_13TeV/EVNT/e8388_e7400/mc16_13TeV.601226.PhPy8_A14_NNPDF31_ttbb_4FS_bzd5_dilep.merge.EVNT.e8388_e7400_tid27618965_00/EVNT.27618965._000691.pool.root.1)",
    "state" : "FAILED"
  } ]
}

When it is called with "wrong" full path everything work fine

$ curl -s --capath /etc/grid-security/certificates --cert /tmp/x509up_u8021 --key /tmp/x509up_u8021 --cacert /tmp/x509up_u8021 -X POST -H 'Content-TYpe: application/json' -d '{ "files":[{ "path": "/pnfs/[pic.es/data/atlas/tape/atlasmctape/mc16_13TeV/EVNT/e8388_e7400/mc16_13TeV.601226.PhPy8_A14_NNPDF31_ttbb_4FS_bzd5_dilep.merge.EVNT.e8388_e7400_tid27618965_00/EVNT.27618965._000691.pool.root.1](http://pic.es/data/atlas/tape/atlasmctape/mc16_13TeV/EVNT/e8388_e7400/mc16_13TeV.601226.PhPy8_A14_NNPDF31_ttbb_4FS_bzd5_dilep.merge.EVNT.e8388_e7400_tid27618965_00/EVNT.27618965._000691.pool.root.1)" }] }' https://door01.pic.es:8486/api/v1/tape/stage
{
  "requestId" : "813258a6-2498-4cca-a9f5-a0b6ca163811"
}

$ curl -s --capath /etc/grid-security/certificates --cert /tmp/x509up_u8021 --key /tmp/x509up_u8021 --cacert /tmp/x509up_u8021 -XGET -H 'Content-TYpe: application/json' https://door01.pic.es:8486/api/v1/tape/stage/813258a6-2498-4cca-a9f5-a0b6ca163811
{
  "id" : "813258a6-2498-4cca-a9f5-a0b6ca163811",
  "createdAt" : 1718316984917,
  "startedAt" : 1718316984922,
  "completedAt" : 1718316985056,
  "files" : [ {
    "path" : "/pnfs/[pic.es/data/atlas/tape/atlasmctape/mc16_13TeV/EVNT/e8388_e7400/mc16_13TeV.601226.PhPy8_A14_NNPDF31_ttbb_4FS_bzd5_dilep.merge.EVNT.e8388_e7400_tid27618965_00/EVNT.27618965._000691.pool.root.1](http://pic.es/data/atlas/tape/atlasmctape/mc16_13TeV/EVNT/e8388_e7400/mc16_13TeV.601226.PhPy8_A14_NNPDF31_ttbb_4FS_bzd5_dilep.merge.EVNT.e8388_e7400_tid27618965_00/EVNT.27618965._000691.pool.root.1)",
    "finishedAt" : 1718316985054,
    "startedAt" : 1718316984919,
    "state" : "COMPLETED"
  } ]
}8486

This is weird, because your configuration - frontend.root for door01.pic.es:8486 - looks OK...

Do we've missed any configuration?

DmitryLitvintsev commented 2 weeks ago

Elena,

you have a typo:

webdav.root=/pnfs/[pic.es/data/atlas/tape](http://pic.es/data/atlas/tape)

Dmitry

DmitryLitvintsev commented 2 weeks ago

After you corrected that, and if it still does not work, change you storage-authzdb to look like:

authorize atlas001   read-write 31051 1307  / /pic.es/data/atlas/tape /
authorize atprd001   read-write 42001 1307  / /pic.es/data/atlas/tape /

Alterntatively you likely want to do:

authorize atlas001   read-write 31051 1307  / /pic.es/data/atlas /
authorize atprd001   read-write 42001 1307  / /pic.es/data/atlas /

And

frontend.root=/pnfs/pic.es/data/atlas
webdav.root=/pnfs/pic.es/data/atlas
ageorget commented 2 weeks ago

This "workaround" did not work for us at IN2P3. Adding authorize atlagrid read-write 3327 124 / /pnfs/in2p3.fr/data/atlas / (instead of / / /) to storage-authzdb moved /upload to /pnfs/in2p3.fr/data/atlas/upload and most of SRM+HTTPS transfers started to fail with errors like HTTP/1.1 400 No such directory: /pnfs/in2p3.fr/data/atlas/upload/16/d7aa2df4-3faa-497b-a631-a8e96eb1776b Even after creating /pnfs/in2p3.fr/data/atlas/upload directory. Our generalist webdav doors used for SRM+HTTPS don't use prefix so it's not compatible with VO dedicated webdav doors with prefix.

Is it such a big development work to make bulk service compatible with prefix if frontend.root and webdav.root are set?

Adrien

elenamplanas commented 2 weeks ago

Hi,

I did something wrong on cut/paste, the definition in the layout file has no "[". It was introduced because in the origin mail it was marked as a link.

This is the definition : webdav.root=/pnfs/pic.es/data/atlas

I'm not sure about do the changes on the storage-authzdb file.

I've seen a global parameter on dcache.properties that maybe could be added in the door definition:

dcache.root = /

I don't know is this could help.

Thansk! Elena

DmitryLitvintsev commented 2 weeks ago

This "workaround" did not work for us at IN2P3. Adding authorize atlagrid read-write 3327 124 / /pnfs/in2p3.fr/data/atlas / (instead of / / /) to storage-authzdb moved /upload to /pnfs/in2p3.fr/data/atlas/upload and most of SRM+HTTPS transfers started to fail with errors like HTTP/1.1 400 No such directory: /pnfs/in2p3.fr/data/atlas/upload/16/d7aa2df4-3faa-497b-a631-a8e96eb1776b Even after creating /pnfs/in2p3.fr/data/atlas/upload directory. Our generalist webdav doors used for SRM+HTTPS don't use prefix so it's not compatible with VO dedicated webdav doors with prefix.

Is it such a big development work to make bulk service compatible with prefix if frontend.root and webdav.root are set?

Adrien

Adrien,

Fair enogh:

Here is a possible fix: https://rb.dcache.org/r/14267/

If you like I can build you a RPM to install. Meanwhile I will do more testing. This is trickier than it looks.

ageorget commented 2 weeks ago

Thanks @DmitryLitvintsev for working on it. If you can provide us a RPM we can try to test it yes.

DmitryLitvintsev commented 2 weeks ago

Thanks @DmitryLitvintsev for working on it. If you can provide us a RPM we can try to test it yes.

This is link to frontend jar. You can copy it over your existting /usr/share/dcache/classes/dcache-frontend*jar and restart frontend only on the host(s) running frontend(s)

https://drive.google.com/file/d/1FtLPZf3BkDXuQUp-nqCkI74zVAAs6mt_/view?usp=sharing

or just install RPM below:

https://drive.google.com/file/d/1tnryH-VrRxjAMizfpTS4SeHmtN1Flw2I/view?usp=sharing

@elenamplanas you can try as well.

To be cautious deploy it on your test system first.

ageorget commented 2 weeks ago

Thanks @DmitryLitvintsev

I deployed the patch this morning and ran some tests. Manual tests and CMS SAM loadtests look good for now :

stage2.json :
{
"files": [
{"path": "/data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root","diskLifetime":"PT1H"}
]
}

curl -v --capath /etc/grid-security/certificates --cacert $X509_USER_PROXY --cert $X509_USER_PROXY -X POST "https://ccdcamcli07.in2p3.fr:3880/api/v1/tape/stage" -H  "accept: application/json" -H  "content-type: application/json" -d @stage2.json

* upload completely sent off: 195 out of 195 bytes
< HTTP/1.1 201 Created
< Date: Mon, 24 Jun 2024 08:20:33 GMT
< Server: dCache/9.2.14
< Location: https://ccdcamcli07.in2p3.fr:3880/api/v1/tape/stage/ef2aa686-755e-49df-a668-28d827e08ba0
< Content-Type: application/json
< Content-Length: 58
request info ef2aa686-755e-49df-a668-28d827e08ba0
ef2aa686-755e-49df-a668-28d827e08ba0:
status:           COMPLETED
arrived at:       2024-06-24 10:20:33.856
started at:       2024-06-24 10:20:33.894
last modified at: 2024-06-24 10:20:46.037
target prefix:    /pnfs/in2p3.fr/data/cms
targets:
CREATED                   |                   STARTED |                 COMPLETED |        STATE | TARGET
2024-06-24 10:20:33.861   |   2024-06-24 10:20:33.861 |    2024-06-24 10:20:46.03 |    COMPLETED | /data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root

request info 669a4d6a-844a-4b36-b9ee-0635f7dd89e1
669a4d6a-844a-4b36-b9ee-0635f7dd89e1:
status:           COMPLETED
arrived at:       2024-06-24 10:21:20.086
started at:       2024-06-24 10:21:20.104
last modified at: 2024-06-24 10:21:20.161
target prefix:    /pnfs/in2p3.fr/data/cms
targets:
CREATED                   |                   STARTED |                 COMPLETED |        STATE | TARGET
2024-06-24 10:21:20.091   |   2024-06-24 10:21:20.091 |   2024-06-24 10:21:20.155 |    COMPLETED | /data/store/test/rucio/store/test/loadtest/source/T1_FR_CCIN2P3_Tape_Test/urandom.270MB.file0001

curl  --capath /etc/grid-security/certificates --cert /tmp/x509up_u`id -u` --cacert /tmp/x509up_u`id -u` --key  /tmp/x509up_u`id -u` -X POST https://ccdcamcli07.in2p3.fr:3880/api/v1/tape/archiveinfo -H "Content-Type: application/json" -d '{"paths"  : ["/data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root"]}' | jq

[
  {
    "path": "/data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root",
    "locality": "DISK_AND_TAPE"
  }
]
DmitryLitvintsev commented 2 weeks ago

OK, I will merge the patch.

elenamplanas commented 1 week ago

Hi Dmitry

Petr has confirmed that it works properly with the change.

Thanks a lot! Elena