dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
291 stars 136 forks source link

xrdcp fails to extract archive zip file #4258

Closed vingar closed 6 years ago

vingar commented 6 years ago

Motivation

xrdcp fails to extract archive zip file against dcache:

2018-10-10 11:29:39,010 DEBUG   Executing: xrdcp -vf root://grid-dc.rzg.mpg.de:1094//pnfs/rzg.mpg.de/data/atlas/dq2/atlasdatadisk/rucio/data16_13TeV/db/3f/DRAW_RPVLL.14552406._000091.zip.1?xrdcl.unzip=DRAW_RPVLL.11106701._002823.pool.root.1 -z DRAW_RPVLL.11106701._002823.pool.root.1 file:///afs/cern.ch/user/m/mlassnig/data16_13TeV/DRAW_RPVLL.11106701._002823.pool.root.1.part
2018-10-10 11:29:39,303 DEBUG   xrdcp status: 54
2018-10-10 11:29:39,303 DEBUG   xrdcp stdout: 
[0B/0B][100%][==================================================][0B/s]  
Run: [ERROR] Server responded with an error: [3015] Not a file
paulmillar commented 6 years ago

Rod also reported this problem. Here's his command/comments:

$ xrdcp -v --zip DRAW_RPVLL.11106701._002823.pool.root.1 root://lcg-lrz-rootd.grid.lrz.de:1094/pnfs/lrz-muenchen.de/data/atlas/dq2/atlasdatadisk/rucio/data16_13TeV/db/3f/DRAW_RPVLL.14552406._000091.zip.1 /tmp/pants
[0B/0B][100%][==================================================][0B/s] 
Run: [ERROR] Server responded with an error: [3010] Read permission denied

$ xrdcp --zip DRAW_RPVLL.11106701._002823.pool.root.1 root://grid-dc.rzg.mpg.de:1094//pnfs/rzg.mpg.de/data/atlas/dq2/atlasdatadisk/rucio/data16_13TeV/db/3f/DRAW_RPVLL.14552406._000091.zip.1 /tmp/pants
[0B/0B][100%][==================================================][0B/s] 
Run: [ERROR] Server responded with an error: [3015] Not a file

Rod reports that: in both cases I can download the zip file. It works for dpm.

$ xrdcp --zip DRAW_RPVLL.11106701._002823.pool.root.1 ro/lapp-se01.in2p3.fr:1094//dpm/in2p3.fr/home/atlas/atlasdatadisk/rucio/data16_valid/b4/56/DRAW_RPVLL.14459284._000203.zip.1 /tmp/pants
[16MB/159.1MB][ 10%][=====>    
paulmillar commented 6 years ago

Rod also reported that the problem was observed with LRZ running dCache v4.1.16; that this is likely NOT a regression.

Following Rod's description, I was able to reproduce the problem with prometheus, using the following commands:

paul@celebrimbor:~$ zip -j0 test.zip /bin/bash
  adding: bash (stored 0%)
paul@celebrimbor:~$ unzip -v test.zip 
Archive:  test.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
 1099016  Stored  1099016   0% 2017-05-15 21:45 ddbc6e90  bash
--------          -------  ---                            -------
 1099016          1099016   0%                            1 file
paul@celebrimbor:~$ globus-url-copy file://`pwd`/test.zip gsiftp://prometheus.desy.de/Users/paul/test.zip
paul@celebrimbor:~$ xrdcp --zip bin/bash.zip root://prometheus.desy.de:1094/Users/paul/test.zip /tmp/
[0B/0B][100%][==================================================][0B/s]  
Run: [ERROR] Server responded with an error: [3015] Not a file
bbockelm commented 6 years ago

Is it possible to set the debug output to level 3 to see precisely where xrdcp is choking?

paulmillar commented 6 years ago

Here is the output from from:

xrdcp -v -d3 --zip bin/bash.zip root://prometheus.desy.de:1094/Users/paul/test.zip /tmp

The problem seems to come from the pool rejecting the open.

bbockelm commented 6 years ago

If I'm reading this right:

So ... maybe an Xrootd client issue?

paulmillar commented 6 years ago

Thanks Brian.

Your description is consistent with how dCache should behaviour: all non-file operations on the pool will solicit a redirection back to the door. Therefore, it is expected that a client issuing an kXR_stat request to the pool will receive an kXR_redirect response.

However, this is only part of what's happening here. The kXR_stat request accepts an (undocumented, see xrootd/xrootd#839) fhandle field. I guess this is meant to be a valid file handle.

In the door, the fhandle field is completely ignored, so only kXR_stat requests that target a file by path will succeed. This makes some sense, since the door never issues any file handles, so the client cannot (legitimately) specify the kXR_stat request with a file handle to the door.

As it happens, we recently recently added limited support for kXR_stat support on the pool, but only for TPC clients. We can look into extending this to include all clients.

There's also currently no support in dCache for the kXR_retstat option to kXR_open. I suspect adding this will also fix this problem.

bbockelm commented 6 years ago

Ah - from other experience, kXR_retstat is a quite useful mechanism for avoiding some round trips. Regardless of how this ticket ends up, I'd strongly support getting it implemented in dCache.

kofemann commented 6 years ago

Hm...there was a change from Al to support stat on pools. May be request inside pool takes an other code path...

On Wed, Oct 10, 2018, 18:48 Brian P Bockelman notifications@github.com wrote:

Ah - from other experience, kXR_retstat is a quite useful mechanism for avoiding some round trips. Regardless of how this ticket ends up, I'd strongly support getting it implemented in dCache.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dCache/dcache/issues/4258#issuecomment-428646502, or mute the thread https://github.com/notifications/unsubscribe-auth/AAjJ3d_caC4CKAjgWrI0QM6_ziQom6dQks5ujiTYgaJpZM4XVCqX .

kofemann commented 6 years ago

Ok, so it looks like we support stat only for TPC:

https://github.com/dCache/dcache/blob/master/modules/dcache-xrootd/src/main/java/org/dcache/xrootd/pool/XrootdPoolRequestHandler.java#L366

paulmillar commented 6 years ago

Further information:

Yes, the xrootd client recovery is broken; however, fixing this (as available on the current tip of xrootd master) does not help. The recovery procedure is to open the file (triggering another redirection to a pool) and issuing the kXR_stat request on the pool. This creates a loop.

dCache does support the kXR_retstat option (since at least 2012, probably before) and returns information about the file that xrootd client seems to parse correctly. Therefore, the xrootd client appears to ignore the stat information returned from the kXR_open request and always issues a kXR_stat request.

paulmillar commented 6 years ago

Patch: https://rb.dcache.org/r/11239/