Closed vingar closed 6 years ago
Rod also reported this problem. Here's his command/comments:
$ xrdcp -v --zip DRAW_RPVLL.11106701._002823.pool.root.1 root://lcg-lrz-rootd.grid.lrz.de:1094/pnfs/lrz-muenchen.de/data/atlas/dq2/atlasdatadisk/rucio/data16_13TeV/db/3f/DRAW_RPVLL.14552406._000091.zip.1 /tmp/pants
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3010] Read permission denied
$ xrdcp --zip DRAW_RPVLL.11106701._002823.pool.root.1 root://grid-dc.rzg.mpg.de:1094//pnfs/rzg.mpg.de/data/atlas/dq2/atlasdatadisk/rucio/data16_13TeV/db/3f/DRAW_RPVLL.14552406._000091.zip.1 /tmp/pants
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3015] Not a file
Rod reports that: in both cases I can download the zip file. It works for dpm.
$ xrdcp --zip DRAW_RPVLL.11106701._002823.pool.root.1 ro/lapp-se01.in2p3.fr:1094//dpm/in2p3.fr/home/atlas/atlasdatadisk/rucio/data16_valid/b4/56/DRAW_RPVLL.14459284._000203.zip.1 /tmp/pants
[16MB/159.1MB][ 10%][=====>
Rod also reported that the problem was observed with LRZ running dCache v4.1.16; that this is likely NOT a regression.
Following Rod's description, I was able to reproduce the problem with prometheus, using the following commands:
paul@celebrimbor:~$ zip -j0 test.zip /bin/bash
adding: bash (stored 0%)
paul@celebrimbor:~$ unzip -v test.zip
Archive: test.zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
1099016 Stored 1099016 0% 2017-05-15 21:45 ddbc6e90 bash
-------- ------- --- -------
1099016 1099016 0% 1 file
paul@celebrimbor:~$ globus-url-copy file://`pwd`/test.zip gsiftp://prometheus.desy.de/Users/paul/test.zip
paul@celebrimbor:~$ xrdcp --zip bin/bash.zip root://prometheus.desy.de:1094/Users/paul/test.zip /tmp/
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3015] Not a file
Is it possible to set the debug output to level 3 to see precisely where xrdcp
is choking?
Here is the output from from:
xrdcp -v -d3 --zip bin/bash.zip root://prometheus.desy.de:1094/Users/paul/test.zip /tmp
The problem seems to come from the pool rejecting the open.
If I'm reading this right:
Open has returned with status [ERROR] Server responded with an error: [3015] Not a file
So ... maybe an Xrootd client issue?
Thanks Brian.
Your description is consistent with how dCache should behaviour: all non-file operations on the pool will solicit a redirection back to the door. Therefore, it is expected that a client issuing an kXR_stat
request to the pool will receive an kXR_redirect
response.
However, this is only part of what's happening here. The kXR_stat
request accepts an (undocumented, see xrootd/xrootd#839) fhandle
field. I guess this is meant to be a valid file handle.
In the door, the fhandle
field is completely ignored, so only kXR_stat
requests that target a file by path will succeed. This makes some sense, since the door never issues any file handles, so the client cannot (legitimately) specify the kXR_stat
request with a file handle to the door.
As it happens, we recently recently added limited support for kXR_stat support on the pool, but only for TPC clients. We can look into extending this to include all clients.
There's also currently no support in dCache for the kXR_retstat
option to kXR_open
. I suspect adding this will also fix this problem.
Ah - from other experience, kXR_retstat
is a quite useful mechanism for avoiding some round trips. Regardless of how this ticket ends up, I'd strongly support getting it implemented in dCache.
Hm...there was a change from Al to support stat on pools. May be request inside pool takes an other code path...
On Wed, Oct 10, 2018, 18:48 Brian P Bockelman notifications@github.com wrote:
Ah - from other experience, kXR_retstat is a quite useful mechanism for avoiding some round trips. Regardless of how this ticket ends up, I'd strongly support getting it implemented in dCache.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dCache/dcache/issues/4258#issuecomment-428646502, or mute the thread https://github.com/notifications/unsubscribe-auth/AAjJ3d_caC4CKAjgWrI0QM6_ziQom6dQks5ujiTYgaJpZM4XVCqX .
Ok, so it looks like we support stat only for TPC:
Further information:
Yes, the xrootd client recovery is broken; however, fixing this (as available on the current tip of xrootd master) does not help. The recovery procedure is to open the file (triggering another redirection to a pool) and issuing the kXR_stat
request on the pool. This creates a loop.
dCache does support the kXR_retstat
option (since at least 2012, probably before) and returns information about the file that xrootd client seems to parse correctly. Therefore, the xrootd client appears to ignore the stat information returned from the kXR_open
request and always issues a kXR_stat
request.
Motivation
xrdcp fails to extract archive zip file against dcache: