DIRACGrid / DIRAC

DIRAC Grid
http://diracgrid.org
GNU General Public License v3.0
114 stars 175 forks source link

TPC problems replicating from dCache to xrootd sites #5727

Closed sfayer closed 2 years ago

sfayer commented 2 years ago

Hi,

We're seeing TPC transfer problems when replicating from dCache (SRM protocol) to xrootd (root protocol) sites with the error "The local source file does not exist or is a directory". DIRAC version is v7r2p28 with DIRACOS v1r27.

What appears to be happening is that the __getSingleTransportURL function is initially called for the source with the protocol 'xroot': dCache (7.2) happily responds to this and provides a TURL that starts xroot://hostname...

This source URL is eventually passed to GFAL2_XROOTStorage.__putSingleFile but the XROOTStorage module only lists "root" as an INPUT_PROTOCOL, not "xroot", so presumes it's a local path and trips up trying to stat it.

Could we please add xroot to the input protocol list, or add a filter here[1] to rewrite the URL string to just root:// if it starts with xroot:// before it's returned? (I tried the latter and it seemed to fix the problem).

[1] https://github.com/DIRACGrid/DIRAC/blob/2e353c23514379074d9040cd455f59dbf2fec439/src/DIRAC/Resources/Storage/GFAL2_SRM2Storage.py#L167-L168

(I was going to submit a patch directly, but as there were at least two possible fixes to choose from I thought you might want to discuss it first...).

Regards, Simon

chaen commented 2 years ago

Hi @sfayer

I checked with the xroot developers, and although historically there was a difference, nowadays the protocols are in principle aliases. So I'd prefer the option of adding xroot to the list of INPUT_PROTOCOL. (Note that as a test, you can override this list in your CS)

I am very puzzled by dCache not accepting root in the query, but that's another story

bash-4.2$ gfal-xattr -D "SRM PLUGIN:TURL_PROTOCOLS=root" srm://lhcbsrm-disk-kit.gridka.de:8443/srm/managerv2?SFN=/pnfs/gridka.de/lhcb/LHCb_USER/lhcb/user/c/chaen/zozo.xml user.replicas
gfal-xattr error: 95 (Operation not supported) - srm-ifce err: Operation not supported, err: [SE][PrepareToGet][SRM_NOT_SUPPORTED] httpg://lhcbsrm-disk-kit.gridka.de:8443/srm/managerv2: Protocol(s) not supported: [root]

bash-4.2$ gfal-xattr -D "SRM PLUGIN:TURL_PROTOCOLS=xroot" srm://lhcbsrm-disk-kit.gridka.de:8443/srm/managerv2?SFN=/pnfs/gridka.de/lhcb/LHCb_USER/lhcb/user/c/chaen/zozo.xml user.replicas
xroot://lhcbsrm-kit.gridka.de:1094/pnfs/gridka.de/lhcb/LHCb_USER/lhcb/user/c/chaen/zozo.xml

However, I am a bit curious about what you are doing. Are you replicating interactively, or with the RMS, or with FTS ?

chaen commented 2 years ago

And btw, kudos to walk your way through these dark corners of DIRAC :-p

sfayer commented 2 years ago

Thanks, I've asked the end user to try it out with a modified input protocol list and report back.

The use case here is just replications using dirac-dms-replicate-lfn initially: They do want to automate all of this, but it's being done manually for now as we haven't redeployed/configured/tested all of the components needed for the full RMS/FTS3/transformation set-up yet.

I was going to include my theories on the dCache xroot/root stuff, but I see you've already opened a dCache ticket about that :-)

Regards, Simon

chaen commented 2 years ago

To be honest, with the latest trends (multihop and other geniuses ideas), dirac-dms-replicate-lfn will work less and less often... Actually it will, it will just do a local copy first...

For completeness, here is the dCache ticket :-) https://github.com/dCache/dcache/issues/6372

guiguem commented 2 years ago

Hi, I added xroot in https://github.com/DIRACGrid/DIRAC/blob/2e353c23514379074d9040cd455f59dbf2fec439/src/DIRAC/Resources/Storage/GFAL2_XROOTStorage.py#L30 as suggested by Simon, and it fixed the issue about the duplication between the two sites. Thanks a lot!

Concerning the dirac-dms-replicate-lfn future, what is the preferred method to replace this script? Local copying is annoying because it is doubling the transfered data but fine to some extend.

chaen commented 2 years ago

I've just added xroot here https://github.com/DIRACGrid/DIRAC/pull/5755

The preferred method to replace the script is unfortunately either a local copying or the full fledged machinery Simon mentioned

fstagni commented 2 years ago

Closing as included in v7.3.15.