Open ageorget opened 5 months ago
Hi Adrien,
Did you set frontend.root
to the same value as webdav.root
?
Kind regards, Onno
Hi Onno,
No, the frontend is generalist and not dedicated to a VO like the Webdav doors.
But I tried to add frontend.root
to the same value as webdav.root
(and restart) for testing but it does not change anything for this test (but dCache-view displays correct relative path).
Cheers, Adrien
Hi Adrien,
I think I can conclude that WebDAV supports relative path, but REST API does not. Correct? While we are at it, can you check if namespace resource of frontend suffer from the same issue?
Dmitry
One thing I suggest to try.
Change storage-authzdb
so it looks like:
authorize atlagrid read-write 3327 124 / /pnfs/in2p3.fr/data/atlas /
authorize cmsgrid read-write 3033 119 / /pnfs/in2p3.fr/data/cms /
authorize lhcbgrid read-write 3437 155 / /pnfs/in2p3.fr/data/lhcb /
Then set webdav.root=/
frontend.root=/
. Restart doors and retry.
Hi Dmitry, Well it cannot be done so easily on our side because some protocols like Webdav use basepath and some don't, like SRM and XRootD. So it may needs some changes in CRIC at the same time of changing the dCache conf and this will require a site downtime I guess.
The Webdav door should be able to contact the REST API with the path resolution no? BTW, the 9.2.0 release contains "Path resolution (for relative paths) has been integrated into bulk request processing.". Is it for a different usecase?
Yes. we were under impression that this was been properly handled. I am just suggesting a mitigation.
I believe that in path defined in storage-authzdb XRootD and WebDAV will work fine (with both full path and relative path). FTP will take only relative path. SRM will take both.
Meanwhile I will be getting to the bottom of your issue. Need to reproduce it on test system.
But yes, you are right. Since this issue concerns REST API only and this is sort of "experimental" feature, it is safer to wait for proper resolution.
I agree with that. We just started to test the REST API so we can wait.
And PIC looks also concerned by the same issue according to Petr
gfal-ls https://webdav-at1.pic.es:8466/atlasdatatape/SAM/testfile-put-ATLASDATATAPE-1307712801-484ec9b9cdd6.txt
https://webdav-at1.pic.es:8466/atlasdatatape/SAM/testfile-put-ATLASDATATAPE-1307712801-484ec9b9cdd6.txt
gfal-xattr https://webdav-at1.pic.es:8466/atlasdatatape/SAM/testfile-put-ATLASDATATAPE-1307712801-484ec9b9cdd6.txt user.status
gfal-xattr error: 42 (No message of desired type) - [Tape REST API] No such file or directory /atlasdatatape/SAM/testfile-put-ATLASDATATAPE-1307712801-484ec9b9cdd6.txt
I think this is a bug that affects everybody who use xxx.root
property.
NB at Fermilab we don't we prefer to have it defined in /etc/grid-security/storage-authzdb
. We will investigate.
This is a bit unfortunate because Al definitely tried to address it back in April.
Actually I am completry confused. Why gfal-xattr
uses the TAPE-API endpoint ?!
And, yeah, I can confirm the behaviour:
# layout file
webdav.root=/public
frontend.root=${webdav.root}
# 🤷🏼♂️
$ gfal-xattr http://192.168.178.40:2880/
taperestapi.version = v1
taperestapi.uri = http://192.168.178.40:3880/api/v1/tape
taperestapi.sitename = dcache-systest
# list by prefix ✅
$ gfal-ls http://192.168.178.40:2880/
file.txt
# xattr by prefix ⛔
$ gfal-xattr http://192.168.178.40:2880/file.txt user.status
gfal-xattr error: 42 (No message of desired type) - [Tape REST API] No such file or directory /file.txt
# xattr by full path ✅ 🤔
$ gfal-xattr http://192.168.178.40:2880/public/file.txt user.status
NEARLINE
I think gfal-xattr
calls archiveinfo
of REST API
While you at it - can you check if any operations in "our API" , say namespace fail with relative path.
If this the case the issue is limited to frontend/REST and hopefully not deeper in bulk.
Conversely if namespace API works this narrows the parameter space.
I think the problem can be recast like - FrontEnd does not support "frontend.root" variable that looks like needs to match "webdav.root" variable.
I can try to look at it tomorrow as well.
It looks like that in REST requests (tape, bulk, qos?) where path is provided as payload dcache doesn't resolve relative to the configured prefix. We use user provided path as-is. The conversion should happen in both directions: is the request and in the reply:
for (int i = 0; i < len; ++i) {
String requestedPath = jsonArray.getString(i);
String dcachePath = rootPath.chroot(requestedPath).toString();
paths.add(dcachePath);
}
...
return out.stream().map(ai -> {
var a = new ArchiveInfo();
a.setError(ai.getError());
a.setLocality(ai.getLocality());
a.setPath(FsPath.create(ai.getPath()).stripPrefix(rootPath));
return a;
}).toList();
The proposed changes addresses the issue:
$ gfal-xattr http://192.168.178.40:2880/file.txt user.status
NEARLINE
I will check is it possible to make it mo generic and update bulk/qos as well.
Hi @kofemann Do we need to upgrade bulk service and the webdav doors or only the bulk service to get this fix in 9.2.14?
Only frontend service
Hi @kofemann
I upgraded our dCache core service to 9.2.14 yesterday but gfal-xattr command still returns [Tape REST API] No such file or directory
with prefix :
webdav.root=/pnfs/in2p3.fr/data/atlas/
$ gfal-xattr davs://ccdavatlas.in2p3.fr:2880/pnfs/in2p3.fr/data/atlas/atlasdatatape/SAM/1M user.status
NEARLINE
$ gfal-xattr davs://ccdavatlas.in2p3.fr:2880/atlasdatatape/SAM/1M user.status
gfal-xattr error: 42 (No message of desired type) - [Tape REST API] No such file or directory /atlasdatatape/SAM/1M
Should I also set frontend.root=${webdav.root}
in the webdav door layout to make it works, like in your test?
Adrien
Hi @ageorget ,
yes, frontend and webdav config should match, if you want to have consistent behaviour.
Thus frontend.root=${webdav.root}
should be set.
Hi @kofemann For now I only have one global frontend and dedicated Webdav doors for VOs with proper prefix (/atlas, /cms, /lhcb). So that means I should now configure one frontend for each VO?
Do you have a webdav door per experiment?
-kofemann /* caffeinated mutations of the core personality /
On Fri, Mar 15, 2024 at 10:49 AM ageorget @.***> wrote:
Hi @kofemann https://github.com/kofemann For now I only have one global frontend and dedicated Webdav doors for VOs with proper prefix (/atlas, /cms, /lhcb). So that means I should now configure one frontend for each VO?
— Reply to this email directly, view it on GitHub https://github.com/dCache/dcache/issues/7506#issuecomment-1999296336, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEMTXNKNTBHS2JCUFXZ2P3YYK73DAVCNFSM6AAAAABCZYNMECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJZGI4TMMZTGY . You are receiving this because you were mentioned.Message ID: @.***>
Yes a webdav door per experiment (with webdav.root configured) and one generalist for wlcg/dteam
Yes, you will need something similar for frontend, or we need to understand how you can do it with a single frontend, as then you must tight webdav doors to frontends
-kofemann /* caffeinated mutations of the core personality /
On Fri, Mar 15, 2024 at 12:33 PM ageorget @.***> wrote:
Yes a webdav door per experiment (with webdav.root configured) and one generalist for wlcg/dteam
— Reply to this email directly, view it on GitHub https://github.com/dCache/dcache/issues/7506#issuecomment-1999468154, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEMTXJI7W3V4AOIC2CO6OLYYLMA3AVCNFSM6AAAAABCZYNMECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJZGQ3DQMJVGQ . You are receiving this because you were mentioned.Message ID: @.***>
Hi, Any news about this issue?
Hi @kofemann @DmitryLitvintsev
I set up a dedicated frontend service for CMS matching frontend.root=${webdav.root}
but I'm facing the same issue.
All requests are failing with No such file or directory
because the prefix is not used by the bulk service
(ERROR: diskCacheV111.util.FileNotFoundCacheException : CacheException(rc=10001;msg=No such file or directory /data/store/test/rucio/store/test/loadtest/source/T1_FR_CCIN2P3_Tape_Test/urandom.270MB.file0000))
The webdav conf :
[webdav-ccdcacli537Domain]
[webdav-ccdcacli537Domain/webdav]
webdav.root=/pnfs/in2p3.fr/data/cms/
webdav.authn.protocol=https
wlcg-tape-rest-api.json targets the dedicated frontend with :
[Frontend2Domain]
[Frontend2Domain/frontend]
frontend.authn.basic=true
frontend.authn.protocol=https
frontend.root=/pnfs/in2p3.fr/data/cms/
gfal-xattr
is OK :
gfal-xattr -vv davs://ccdavcms.in2p3.fr:2880/data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root user.status
INFO Davix: > GET /.well-known/wlcg-tape-rest-api HTTP/1.1
> User-Agent: gfal2-util/1.8.1 gfal2/2.22.2 neon/0.0.29
> Keep-Alive:
> Connection: Keep-Alive
> TE: trailers
> Host: ccdavcms.in2p3.fr:2880
>
INFO Davix: < HTTP/1.1 200 OK
INFO Davix: < Date: Mon, 29 Apr 2024 08:48:31 GMT
INFO Davix: < Server: dCache/9.2.14
INFO Davix: < Content-Type: application/json;charset=utf-8
INFO Davix: < Transfer-Encoding: chunked
INFO Davix: <
INFO Davix: <
INFO Davix: > POST /api/v1/tape/archiveinfo HTTP/1.1
> User-Agent: gfal2-util/1.8.1 gfal2/2.22.2 neon/0.0.29
> Keep-Alive:
> Connection: Keep-Alive
> TE: trailers
> Host: ccdcamcli07.in2p3.fr:3880
> Content-Type: application/json
> Content-Length: 163
>
INFO Davix: < HTTP/1.1 200 OK
INFO Davix: < Date: Mon, 29 Apr 2024 08:48:31 GMT
INFO Davix: < Server: dCache/9.2.14
INFO Davix: < Content-Type: application/json
INFO Davix: < Content-Length: 179
INFO Davix: <
NEARLINE
But if I try to stage the file it fails :
cat stage2.json
{
"files": [
{"path": "/data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root","diskLifetime":"PT1H"}
]}
curl -v --capath /etc/grid-security/certificates --cacert $X509_USER_PROXY --cert $X509_USER_PROXY -X POST "https://ccdcamcli07.in2p3.fr:3880/api/v1/tape/stage" -H "accept: application/json" -H "content-type: application/json" -d @stage2.json
* Connected to ccdcamcli07.in2p3.fr (2001:660:5009:1:134:158:109:247) port 3880 (#0)
> POST /api/v1/tape/stage HTTP/1.1
> User-Agent: curl/7.29.0
> Host: ccdcamcli07.in2p3.fr:3880
> accept: application/json
> content-type: application/json
> Content-Length: 195
>
* upload completely sent off: 195 out of 195 bytes
< HTTP/1.1 201 Created
< Date: Mon, 29 Apr 2024 08:50:40 GMT
< Server: dCache/9.2.14
< Location: https://ccdcamcli07.in2p3.fr:3880/api/v1/tape/stage/9c4abd96-087b-477c-b67a-62c8fb467cd5
< Content-Type: application/json
< Content-Length: 58
<
{
"requestId" : "9c4abd96-087b-477c-b67a-62c8fb467cd5"
* Connection #0 to host ccdcamcli07.in2p3.fr left intact
}
curl --capath /etc/grid-security/certificates --cacert $X509_USER_PROXY --cert $X509_USER_PROXY -X GET "https://ccdcamcli07.in2p3.fr:3880/api/v1/tape/stage/9c4abd96-087b-477c-b67a-62c8fb467cd5" -H "accept: application/json"
{
"id" : "9c4abd96-087b-477c-b67a-62c8fb467cd5",
"createdAt" : 1714380640973,
"startedAt" : 1714380641016,
"completedAt" : 1714380641071,
"files" : [ {
"path" : "/data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root",
"finishedAt" : 1714380640979,
"startedAt" : 1714380640979,
"error" : "CacheException(rc=10001;msg=No such file or directory /data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root)",
"state" : "FAILED"
} ]
}%
In the bulk log :
Apr 29 10:50:41 ccdcamcli06 dcache@bulkDomain[49218]: 29 Apr 2024 10:50:41 (bulk) [] 9c4abd96-087b-477c-b67a-62c8fb467cd5 - fetchAttributes, callback failure for TARGET [241299, INITIAL, null][null][CREATED: (C 2024-04-29 10:50:40.979)(S null)(U 2024-04-29 10:50:40.979)(ret 0)][null] null : /data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root (err null null).
(bulk@bulkDomain) ageorget > request info 9c4abd96-087b-477c-b67a-62c8fb467cd5
9c4abd96-087b-477c-b67a-62c8fb467cd5:
status: COMPLETED
arrived at: 2024-04-29 10:50:40.973
started at: 2024-04-29 10:50:41.013
last modified at: 2024-04-29 10:50:41.07
target prefix: /
targets:
CREATED | STARTED | COMPLETED | STATE | TARGET
2024-04-29 10:50:40.979 | 2024-04-29 10:50:40.979 | 2024-04-29 10:50:40.979 | FAILED | /data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root -- (ERROR: diskCacheV111.util.FileNotFoundCacheException : CacheException(rc=10001;msg=No such file or directory /data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root))
Hi, @ageorget. Could you please add the full path name to the issue so, that we can see to which path it should be resolved to?
The full path for the last example is
-rw-r--r-- 1 cmsgrid cmsf 4.0G Apr 26 15:11 /pnfs/in2p3.fr/data/cms/data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root
So prefix + path: /pnfs/in2p3.fr/data/cms
+ /data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root
Can someone from the dCache team provide an update where things stand? (CMS needs this fixed to complete the SRMv2 phase out.) Thanks,
on CMS T1 system: @ageorget @kofemann @stlammel
It works for us. On CMS T1 tape:
frontend.root = /pnfs/fs/usr/cms
webdav.root = /pnfs/fs/usr/cms
storage-authzdb:
authorize cmsprod read-write 9811 5063,9114,9247 / /pnfs/fs/usr/cms /pnfs/fs/usr/cms
check archiveinfo:
$ curl --capath /etc/grid-security/certificates --cert /tmp/x509up_u`id -u` --cacert /tmp/x509up_u`id -u` --key /tmp/x509up_u`id -u` -X POST https://cmsdcatape.fnal.gov:3880/api/v1/tape/archiveinfo -H "Content-Type: application/json" -d '{"paths" : ["/WAX/11/store/test/rucio/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCal\MaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root"]}'
reply :
[{"path":"/WAX/11/store/test/rucio/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root","locality":"TAPE"}]
bring it online:
$ gfal-bringonline https://cmsdcatape.fnal.gov:3880/WAX/11/store/test/rucio/store/express/Run2023C/Strea\
mALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root
reply:
https://cmsdcatape.fnal.gov:3880/WAX/11/store/test/rucio/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root QUEUED
in bulk I see:
[cmsdcatape02new] (bulk@bulkDomain) admin > request ls
ID | ARRIVED | MODIFIED | OWNER | STATUS | UID
6240 | 2024/05/24-15:39:13 | 2024/05/24-15:39:14 | 9811:5063 | STARTED | 31c38979-d1f2-4ac2-883c-0b00458f206d
[cmsdcatape02new] (bulk@bulkDomain) admin > request info 31c38979-d1f2-4ac2-883c-0b00458f206d
31c38979-d1f2-4ac2-883c-0b00458f206d:
status: STARTED
arrived at: 2024-05-24 15:39:13.697
started at: 2024-05-24 15:39:14.336
last modified at: 2024-05-24 15:39:14.336
target prefix: /pnfs/fs/usr/cms
targets:
CREATED | STARTED | COMPLETED | STATE | TARGET
2024-05-24 15:39:13.92 | 2024-05-24 15:39:13.92 | ? | RUNNING | /WAX/11/store/test/rucio\
/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-\
72ec4c6da394.root
and after while
[cmsdcatape02new] (bulk@bulkDomain) admin > request info 31c38979-d1f2-4ac2-883c-0b00458f206d
31c38979-d1f2-4ac2-883c-0b00458f206d:
status: COMPLETED
arrived at: 2024-05-24 15:39:13.697
started at: 2024-05-24 15:39:14.336
last modified at: 2024-05-24 15:42:18.336
target prefix: /pnfs/fs/usr/cms
targets:
CREATED | STARTED | COMPLETED | STATE | TARGET
2024-05-24 15:39:13.92 | 2024-05-24 15:39:13.92 | 2024-05-24 15:42:18.324 | COMPLETED | /WAX/11/store/test/rucio/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root
And checking archiveinfo
again:
$ curl --capath /etc/grid-security/certificates --cert /tmp/x509up_u`id -u` --cacert /tmp/x509up_u`id -u` --key /tmp/x509up_u`id -u` -X POST https://cmsdcatape.fnal.gov:3880/api/v1/tape/archiveinfo -H "Content-Type: application/json" -d '{"paths" : ["/WAX/11/store/test/rucio/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root"]}'
reply:
[{"path":"/WAX/11/store/test/rucio/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root","locality":"DISK_AND_TAPE"}]
As you can see works as designed for us. The real full path of the file is:
# ls -al /pnfs/fs/usr/cms/WAX/11/store/test/rucio/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root
-rw-r--r-- 1 9811 5063 43073478 Feb 27 04:12 /pnfs/fs/usr/cms/WAX/11/store/test/rucio/store/express/Run2023C/StreamALCAPPSExpress/ALCARECO/PPSCalMaxTracks-Express-v4/000/368/382/00000/271228d6-6080-40f1-ad54-72ec4c6da394.root
@ageorget the frontend.root
in your case is real file path or symlink? (just fishing for possible differences)
Dmitry
Thanks Dmitry! @DmitryLitvintsev Let's try to track down the Fermilab/In2P3 difference and see if we can overcome this.
Hi @DmitryLitvintsev
The frontend.root
is a real file path, not a symlink.
The only difference I see in our dCache conf is the storage-authzdb which is set to the root path :
authorize atlagrid read-write 3327 124 / / /
authorize cmsgrid read-write 3033 119 / / /
If we need to set basepath in the storage-authzdb, we will have to coordinates the changes with CRIC because as I said, currently some protocols like Webdav use basepath and some don't, like SRM and XRootD (redirector, local access).
Same tests you mentioned using the Atlas frontend (9.2.14 withfrontend.root=/pnfs/in2p3.fr/data/atlas
) :
check archiveinfo OK :
curl --capath /etc/grid-security/certificates --cert /tmp/x509up_u`id -u` --cacert /tmp/x509up_u`id -u` --key /tmp/x509up_u`id -u` -X POST https://ccdcamcli08.in2p3.fr:3880/api/v1/tape/archiveinfo -H "Content-Type: application/json" -d '{"paths" : ["/atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1"]}'
[{"path":"/atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1","locality":"TAPE"}]
bring it online OK :
gfal-bringonline https://ccdcamcli08.in2p3.fr:3880/atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1
https://ccdcamcli08.in2p3.fr:3880/atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1 QUEUED
in bulk, request failed :
request info 776cec02-be3b-4222-98e9-1cb7d000ea15
776cec02-be3b-4222-98e9-1cb7d000ea15:
status: COMPLETED
arrived at: 2024-05-27 13:59:32.691
started at: 2024-05-27 13:59:32.705
last modified at: 2024-05-27 13:59:32.724
target prefix: /
targets:
CREATED | STARTED | COMPLETED | STATE | TARGET
2024-05-27 13:59:32.695 | 2024-05-27 13:59:32.695 | 2024-05-27 13:59:32.695 | FAILED | /atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1 -- (ERROR: diskCacheV111.util.FileNotFoundCacheException : CacheException(rc=10001;msg=No such file or directory /atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1))
Well, I tried to change storage-authzdb
as @DmitryLitvintsev suggested adding the basepath :
authorize atlagrid read-write 3327 124 / /pnfs/in2p3.fr/data/atlas /
authorize cmsgrid read-write 3033 119 / /pnfs/in2p3.fr/data/cms /
And this was enough to make staging work using prefix :
request info 6986f840-2cb9-4bfa-87c8-b416643de233
6986f840-2cb9-4bfa-87c8-b416643de233:
status: COMPLETED
arrived at: 2024-05-30 10:26:53.244
started at: 2024-05-30 10:26:53.261
last modified at: 2024-05-30 10:32:16.596
target prefix: /pnfs/in2p3.fr/data/cms
targets:
CREATED | STARTED | COMPLETED | STATE | TARGET
2024-05-30 10:26:53.249 | 2024-05-30 10:26:53.249 | 2024-05-30 10:32:16.586 | COMPLETED | /data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root
request info 417a5959-c3ee-43e8-aafc-d7b5a3f3e5af
417a5959-c3ee-43e8-aafc-d7b5a3f3e5af:
status: COMPLETED
arrived at: 2024-05-30 11:42:14.699
started at: 2024-05-30 11:42:14.712
last modified at: 2024-05-30 11:48:39.733
target prefix: /pnfs/in2p3.fr/data/atlas
targets:
CREATED | STARTED | COMPLETED | STATE | TARGET
2024-05-30 11:42:14.703 | 2024-05-30 11:42:14.703 | 2024-05-30 11:48:39.729 | COMPLETED | /atlasmctape/mc16_13TeV/HITS/e8351_s3126/mc16_13TeV.700337.Sh_2211_Znunu_pTV2_CVetoBVeto.simul.HITS.e8351_s3126_tid30364865_00/HITS.30364865._017868.pool.root.1
But now I see transfers errors with /upload which moves from / to /pnfs/in2p3.fr/data/atlas I think this change should be coordinate with generalist SRM+HTTPS Webdav doors which used webdav.root=/
So using prefix set in storage-authzdb
allowed bulk service to get target prefix: /pnfs/in2p3.fr/data/atlas
configured.
Is there a way to make it compatible with frontend.root
parameter instead of being forced to change the main dCache configuration?
Hi, we're facing a similar problem, using relative path. This week we've upgraded our dcache instance to 9.2.20 with the idea to solve the problem, but it's still failing
At PIC we have an alias for webdav-at1-tape.pic.es that points to door01.pic.es and door02.pic.es
$ host [webdav-at1-tape.pic.es](http://webdav-at1-tape.pic.es/)
[webdav-at1-tape.pic.es](http://webdav-at1-tape.pic.es/) has address 193.109.172.132
[webdav-at1-tape.pic.es](http://webdav-at1-tape.pic.es/) has address 193.109.172.130
[webdav-at1-tape.pic.es](http://webdav-at1-tape.pic.es/) has IPv6 address 2001:67c:1148:201::12
[webdav-at1-tape.pic.es](http://webdav-at1-tape.pic.es/) has IPv6 address 2001:67c:1148:201::11
On both doors we have this configuration:
############################################
# Domain: webdav-at1-tape-https-door01Domain
[webdav-at1-tape-https-${[host.name](http://host.name/)}Domain]
dcache.java.memory.heap=512m
dcache.java.options.extra=-Dcom.sun.management.jmxremote.port=7052 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
dcache.wellknown!wlcg-tape-rest-api.path=${dcache.paths.httpd}/wlcg-tape-rest-api-ATLAS-TAPE.json
webdav.mover.queue=webdav
webdav.net.port=8466
# Cell: frontend
[webdav-at1-tape-https-${[host.name](http://host.name/)}Domain/frontend]
frontend.authn.accept-client-cert=true
frontend.authn.protocol=https
[frontend.cell.name](http://frontend.cell.name/)=frontend
frontend.net.port=8486
frontend.root=/pnfs/[pic.es/data/atlas/tape](http://pic.es/data/atlas/tape)
frontend.static!dcache-view.endpoints.webdav=https://webdav-at1.pic.es:8466/
frontend.static!dcache-view.oidc-authz-endpoint-extra=-
frontend.static!dcache-view.oidc-authz-endpoint-list=https://idp.pic.es/realms/PIC/protocol/openid-connect/auth
frontend.static!dcache-view.oidc-client-id-list=dcache-view
frontend.static!dcache-view.oidc-provider-name-list=PIC
frontend.static!dcache-view.org-name=[pic.es](http://pic.es/)
# Cell: webdav
[webdav-at1-tape-https-${[host.name](http://host.name/)}Domain/webdav]
webdav.allowed.client.origins=https://webdav-at1.pic.es:8486/
webdav.authn.protocol=https
webdav.authz.allowed-paths=/pnfs/[pic.es/data/atlas/tape](http://pic.es/data/atlas/tape)
[webdav.cell.name](http://webdav.cell.name/)=WebDAV-ATLAST1-TAPE-${[host.name](http://host.name/)}
webdav.enable.overwrite=true
webdav.loginbroker.tags=cdmi,dcache-view,glue,storage-descriptor,srmatlas
webdav.redirect.allow-https=true
webdav.redirect.on-read=true
webdav.redirect.on-write=true
webdav.root=/pnfs/[pic.es/data/atlas/tape](http://pic.es/data/atlas/tape)
## End webdav-at1-tape-https-door01Domain
############################################
As you can see the root path defined is the same for the webdav and the frontend On the other side, the dcache.wellknown!wlcg-tape-rest-api.path in each host has the hostname, not the alias.
[root@door01 layouts]# cat /var/lib/dcache/httpd/wlcg-tape-rest-api-ATLAS-TAPE.json
{
"sitename": "PIC",
"description": "This is the dCache WLCG TAPE REST API endpoint for ATLAS",
"endpoints":[
{
"uri":"https://door01.pic.es:8486/api/v1/tape",
"version":"v1",
"metadata": {
}
} ]
}
I don't know how gfal-bringonline internally works, and if it redirects the request to the uri defined in the wellknown file. I could try to put the alias instead of the hostname, but Petr has done tests accessing directly to the hostname with same results.
In our storage-authdb file the entry for ATLAS users is
authorize atlas001 read-write 31051 1307 / / /
authorize atprd001 read-write 42001 1307 / / /
See below tests done by Petr
gfal-bringonline call correctly TAPE REST API, manually called individual HTTP request (with right short path) looks like
$ curl -s --capath /etc/grid-security/certificates --cert /tmp/x509up_u8021 --key /tmp/x509up_u8021 --cacert /tmp/x509up_u8021 -X POST -H 'Content-TYpe: aplication/json' -d '{ "files":[{ "path": "/atlasmctape/mc16_13TeV/EVNT/e8388_e7400/mc16_13TeV.601226.PhPy8_A14_NNPDF31_ttbb_4FS_bzd5_dilep.merge.EVNT.e8388_e7400_tid27618965_00/EVNT.27618965._000691.pool.root.1" }] }' https://door01.pic.es:8486/api/v1/tape/stage
{
"requestId" : "e50d5cc6-8c66-4200-94fe-d001fea8ece7"
}
$ curl -s --capath /etc/grid-security/certificates --cert /tmp/x509up_u8021 --key /tmp/x509up_u8021 --cacert /tmp/x509up_u8021 -XGET -H 'Content-TYpe: application/json' https://door01.pic.es:8486/api/v1/tape/stage/e50d5cc6-8c66-4200-94fe-d001fea8ece7
{
"id" : "e50d5cc6-8c66-4200-94fe-d001fea8ece7",
"createdAt" : 1718316934373,
"startedAt" : 1718316934378,
"completedAt" : 1718316934387,
"files" : [ {
"path" : "/atlasmctape/mc16_13TeV/EVNT/e8388_e7400/mc16_13TeV.601226.PhPy8_A14_NNPDF31_ttbb_4FS_bzd5_dilep.merge.EVNT.e8388_e7400_tid27618965_00/EVNT.27618965._000691.pool.root.1",
"finishedAt" : 1718316934375,
"startedAt" : 1718316934375,
"error" : "CacheException(rc=10001;msg=No such file or directory /atlasmctape/mc16_13TeV/EVNT/e8388_e7400/mc16_13TeV.601226.PhPy8_A14_NNPDF31_ttbb_4FS_bzd5_dilep.merge.EVNT.e8388_e7400_tid27618965_00/EVNT.27618965._000691.pool.root.1)",
"state" : "FAILED"
} ]
}
When it is called with "wrong" full path everything work fine
$ curl -s --capath /etc/grid-security/certificates --cert /tmp/x509up_u8021 --key /tmp/x509up_u8021 --cacert /tmp/x509up_u8021 -X POST -H 'Content-TYpe: application/json' -d '{ "files":[{ "path": "/pnfs/[pic.es/data/atlas/tape/atlasmctape/mc16_13TeV/EVNT/e8388_e7400/mc16_13TeV.601226.PhPy8_A14_NNPDF31_ttbb_4FS_bzd5_dilep.merge.EVNT.e8388_e7400_tid27618965_00/EVNT.27618965._000691.pool.root.1](http://pic.es/data/atlas/tape/atlasmctape/mc16_13TeV/EVNT/e8388_e7400/mc16_13TeV.601226.PhPy8_A14_NNPDF31_ttbb_4FS_bzd5_dilep.merge.EVNT.e8388_e7400_tid27618965_00/EVNT.27618965._000691.pool.root.1)" }] }' https://door01.pic.es:8486/api/v1/tape/stage
{
"requestId" : "813258a6-2498-4cca-a9f5-a0b6ca163811"
}
$ curl -s --capath /etc/grid-security/certificates --cert /tmp/x509up_u8021 --key /tmp/x509up_u8021 --cacert /tmp/x509up_u8021 -XGET -H 'Content-TYpe: application/json' https://door01.pic.es:8486/api/v1/tape/stage/813258a6-2498-4cca-a9f5-a0b6ca163811
{
"id" : "813258a6-2498-4cca-a9f5-a0b6ca163811",
"createdAt" : 1718316984917,
"startedAt" : 1718316984922,
"completedAt" : 1718316985056,
"files" : [ {
"path" : "/pnfs/[pic.es/data/atlas/tape/atlasmctape/mc16_13TeV/EVNT/e8388_e7400/mc16_13TeV.601226.PhPy8_A14_NNPDF31_ttbb_4FS_bzd5_dilep.merge.EVNT.e8388_e7400_tid27618965_00/EVNT.27618965._000691.pool.root.1](http://pic.es/data/atlas/tape/atlasmctape/mc16_13TeV/EVNT/e8388_e7400/mc16_13TeV.601226.PhPy8_A14_NNPDF31_ttbb_4FS_bzd5_dilep.merge.EVNT.e8388_e7400_tid27618965_00/EVNT.27618965._000691.pool.root.1)",
"finishedAt" : 1718316985054,
"startedAt" : 1718316984919,
"state" : "COMPLETED"
} ]
}8486
This is weird, because your configuration - frontend.root for door01.pic.es:8486 - looks OK...
Do we've missed any configuration?
Elena,
you have a typo:
webdav.root=/pnfs/[pic.es/data/atlas/tape](http://pic.es/data/atlas/tape)
Dmitry
After you corrected that, and if it still does not work, change you storage-authzdb
to look like:
authorize atlas001 read-write 31051 1307 / /pic.es/data/atlas/tape /
authorize atprd001 read-write 42001 1307 / /pic.es/data/atlas/tape /
Alterntatively you likely want to do:
authorize atlas001 read-write 31051 1307 / /pic.es/data/atlas /
authorize atprd001 read-write 42001 1307 / /pic.es/data/atlas /
And
frontend.root=/pnfs/pic.es/data/atlas
webdav.root=/pnfs/pic.es/data/atlas
This "workaround" did not work for us at IN2P3.
Adding authorize atlagrid read-write 3327 124 / /pnfs/in2p3.fr/data/atlas /
(instead of / / /) to storage-authzdb moved /upload to /pnfs/in2p3.fr/data/atlas/upload and most of SRM+HTTPS transfers started to fail with errors like HTTP/1.1 400 No such directory: /pnfs/in2p3.fr/data/atlas/upload/16/d7aa2df4-3faa-497b-a631-a8e96eb1776b
Even after creating /pnfs/in2p3.fr/data/atlas/upload directory.
Our generalist webdav doors used for SRM+HTTPS don't use prefix so it's not compatible with VO dedicated webdav doors with prefix.
Is it such a big development work to make bulk service compatible with prefix if frontend.root
and webdav.root
are set?
Adrien
Hi,
I did something wrong on cut/paste, the definition in the layout file has no "[". It was introduced because in the origin mail it was marked as a link.
This is the definition : webdav.root=/pnfs/pic.es/data/atlas
I'm not sure about do the changes on the storage-authzdb file.
I've seen a global parameter on dcache.properties that maybe could be added in the door definition:
dcache.root = /
I don't know is this could help.
Thansk! Elena
This "workaround" did not work for us at IN2P3. Adding
authorize atlagrid read-write 3327 124 / /pnfs/in2p3.fr/data/atlas /
(instead of / / /) to storage-authzdb moved /upload to /pnfs/in2p3.fr/data/atlas/upload and most of SRM+HTTPS transfers started to fail with errors likeHTTP/1.1 400 No such directory: /pnfs/in2p3.fr/data/atlas/upload/16/d7aa2df4-3faa-497b-a631-a8e96eb1776b
Even after creating /pnfs/in2p3.fr/data/atlas/upload directory. Our generalist webdav doors used for SRM+HTTPS don't use prefix so it's not compatible with VO dedicated webdav doors with prefix.Is it such a big development work to make bulk service compatible with prefix if
frontend.root
andwebdav.root
are set?Adrien
Adrien,
Fair enogh:
Here is a possible fix: https://rb.dcache.org/r/14267/
If you like I can build you a RPM to install. Meanwhile I will do more testing. This is trickier than it looks.
Thanks @DmitryLitvintsev for working on it. If you can provide us a RPM we can try to test it yes.
Thanks @DmitryLitvintsev for working on it. If you can provide us a RPM we can try to test it yes.
This is link to frontend jar. You can copy it over your existting /usr/share/dcache/classes/dcache-frontend*jar
and restart frontend only on the host(s) running frontend(s)
https://drive.google.com/file/d/1FtLPZf3BkDXuQUp-nqCkI74zVAAs6mt_/view?usp=sharing
or just install RPM below:
https://drive.google.com/file/d/1tnryH-VrRxjAMizfpTS4SeHmtN1Flw2I/view?usp=sharing
@elenamplanas you can try as well.
To be cautious deploy it on your test system first.
Thanks @DmitryLitvintsev
I deployed the patch this morning and ran some tests. Manual tests and CMS SAM loadtests look good for now :
stage2.json :
{
"files": [
{"path": "/data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root","diskLifetime":"PT1H"}
]
}
curl -v --capath /etc/grid-security/certificates --cacert $X509_USER_PROXY --cert $X509_USER_PROXY -X POST "https://ccdcamcli07.in2p3.fr:3880/api/v1/tape/stage" -H "accept: application/json" -H "content-type: application/json" -d @stage2.json
* upload completely sent off: 195 out of 195 bytes
< HTTP/1.1 201 Created
< Date: Mon, 24 Jun 2024 08:20:33 GMT
< Server: dCache/9.2.14
< Location: https://ccdcamcli07.in2p3.fr:3880/api/v1/tape/stage/ef2aa686-755e-49df-a668-28d827e08ba0
< Content-Type: application/json
< Content-Length: 58
request info ef2aa686-755e-49df-a668-28d827e08ba0
ef2aa686-755e-49df-a668-28d827e08ba0:
status: COMPLETED
arrived at: 2024-06-24 10:20:33.856
started at: 2024-06-24 10:20:33.894
last modified at: 2024-06-24 10:20:46.037
target prefix: /pnfs/in2p3.fr/data/cms
targets:
CREATED | STARTED | COMPLETED | STATE | TARGET
2024-06-24 10:20:33.861 | 2024-06-24 10:20:33.861 | 2024-06-24 10:20:46.03 | COMPLETED | /data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root
request info 669a4d6a-844a-4b36-b9ee-0635f7dd89e1
669a4d6a-844a-4b36-b9ee-0635f7dd89e1:
status: COMPLETED
arrived at: 2024-06-24 10:21:20.086
started at: 2024-06-24 10:21:20.104
last modified at: 2024-06-24 10:21:20.161
target prefix: /pnfs/in2p3.fr/data/cms
targets:
CREATED | STARTED | COMPLETED | STATE | TARGET
2024-06-24 10:21:20.091 | 2024-06-24 10:21:20.091 | 2024-06-24 10:21:20.155 | COMPLETED | /data/store/test/rucio/store/test/loadtest/source/T1_FR_CCIN2P3_Tape_Test/urandom.270MB.file0001
curl --capath /etc/grid-security/certificates --cert /tmp/x509up_u`id -u` --cacert /tmp/x509up_u`id -u` --key /tmp/x509up_u`id -u` -X POST https://ccdcamcli07.in2p3.fr:3880/api/v1/tape/archiveinfo -H "Content-Type: application/json" -d '{"paths" : ["/data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root"]}' | jq
[
{
"path": "/data/store/test/rucio/store/hidata/HIRun2023A/HIPhysicsRawPrime19/MINIAOD/PromptReco-v2/000/375/013/00000/011f9c67-2e16-45f9-a832-d4ad1e834fe0.root",
"locality": "DISK_AND_TAPE"
}
]
OK, I will merge the patch.
Hi Dmitry
Petr has confirmed that it works properly with the change.
Thanks a lot! Elena
Hi,
Following the discussion with Petr, I open this issue about the usage of the Tape Rest API with a Webdav door configured with a relative webdav.root which fails with 9.2.6.
gfal-stat with the relative path :
The API response using the relative path :
and using the full path :
The Webdav configuration has the webdav.root set up :
webdav.root=/pnfs/in2p3.fr/data/atlas/
/var/lib/dcache/httpd/wlcg-tape-rest-api.json
Only the webdav door is configured to use relative path (VO specific). The rest of the dCache conf is using default / root.
/etc/grid-security/storage-authzdb
Frontend logs :
Feb 05 10:02:27 ccdcamcli06 dcache@FrontendDomain[138833]: 05 Feb 2024 10:02:27 (frontend) [] getInfo failed for /atlasdatatape/SAM/1M: No such file or directory /atlasdatatape/SAM/1M.
Adrien