CoffeaTeam / coffea-casa

Repository with configuration setup of a prototype of analysis facility - "coffea-casa"
BSD 3-Clause "New" or "Revised" License
17 stars 19 forks source link

Issue with xrdcp and xrootd python client while using xrdcl-authz-plugin at coffea-casa #374

Open oshadura opened 1 year ago

oshadura commented 1 year ago

The user reported that at CMS coffea-casa AF while using xrdcp to copy files, we see "Operation is not implemented" error:

 xrdcp -f root://xcache//store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000/ffNtuple_1.root /dev/null 
[0B/0B][100%][==================================================][0B/s]  
Run: [ERROR] Operation is not implemented:  (source)

as well there is a segfault while using xrootd python API:

>>> from XRootD import client
>>> xrd = client.FileSystem("root://xcache//store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000/ffNtuple_1.root")
Segmentation fault (core dumped)

Current repository with plugin: https://github.com/jthiltges/xrdcl-authz-plugin/tree/xcache

cc @jthiltges

jthiltges commented 1 year ago

Hi Oksana, can you confirm that the plugin is being built of the xcache branch? Some of the strings suggest it's coming from master. At least for hub.opensciencegrid.org/coffea-casa/cc-ubuntu:2023.03.17.

btcardwell commented 1 year ago

Hi @oshadura and @jthiltges, I'm "the user" in Oksana's original post, and I thought it might be helpful to give a little context. The main functionality I'm looking for is to be able to list files on LPC EOS like I would with xrdfs root://cmseos.fnal.gov ls. Of course fixing this such that xrootd works in general would be great, but if you know another good way to do this from coffea-casa, I'd happily do that instead :)

oshadura commented 1 year ago

Now since we have deployed @jthiltges plugin Segmentation fault (core dumped) is fixed, but still some functionality, such as xrdfs is missing:


# the following command works on lxplus
$ xrdfs root://cmseos.fnal.gov// ls /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000/

# but the equivalent command hangs on coffea-casa
$ xrdfs root://xcache// ls /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000/

# even though the same command works on coffea-casa if I specify one specific file
$ xrdfs root://xcache// ls /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000/ffNtuple_1.root
jthiltges commented 1 year ago

Interesting result. This appears to partially be an issue with our xcache (running in docker).

The xcache tells the client to contact 172.23.0.2, which is a private IP of the xcache container. And as expected, the client cannot connect.

$ xrdfs red-xcache1.unl.edu:1094 locate '*'
[::172.23.0.2]:1094 Server ReadWrite 
$ xrdfs xcache:1094 locate '*'
[::172.23.0.2]:1094 Server ReadWrite

For now, I switched the red-xcache container over to host-mode networking (network_mode: host) and the ls proceeds to fail differently

$ xrdfs red-xcache1.unl.edu ls /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000
[ERROR] Server responded with an error: [3005] Unable to open directory /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000; too many levels of symbolic links

On the xcache server side:

230411 17:53:14 543 scitokens_Access: Grant authorization based on scopes for operation=dir, path=/store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000
[2023-04-11 17:53:18.077326 +0000][Warning][XRootD            ] [u26@cms-xrd-global.cern.ch:1094] Redirect limit has been reached for message kXR_dirlist (path: /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000), the last known error is: [ERROR] Error response: no such file or directory
230411 17:53:18 543 ofs_opendir: cms-jovy.405:26@c2427.shor.hcc Unable to open directory /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000; too many levels of symbolic links
230411 17:53:18 543 cms-jovy.405:26@c2427.shor.hcc Xrootd_Response: sending err 3005: Unable to open directory /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000; too many levels of symbolic links
[2023-04-11 17:53:18.077682 +0000][Warning][XRootD            ] Redirect trace-back:
[2023-04-11 17:53:18.077682 +0000][Warning][XRootD            ]         0. Redirected from: root://cmsxrootd.fnal.gov:1094//store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000 to: root://cms-xrd-global.cern.ch:1094/
[2023-04-11 17:53:18.077682 +0000][Warning][XRootD            ]         1. Redirected from: root://cms-xrd-global.cern.ch:1094/ to: root://cms-xrd-transit.cern.ch:1094/
[2023-04-11 17:53:18.077682 +0000][Warning][XRootD            ]         2. Retrying: root://cms-xrd-global.cern.ch:1094/
...
[2023-04-11 17:53:18.077682 +0000][Warning][XRootD            ]         29. Redirected from: root://cms-xrd-global.cern.ch:1094/ to: root://cms-xrd-transit.cern.ch:1094/
[2023-04-11 17:53:18.077682 +0000][Warning][XRootD            ]         30. Retrying: root://cms-xrd-global.cern.ch:1094/
230411 17:53:18 543 XrdTLS: cms-jovy.405:26@c2427.shor.hcc TLS error rc=0 ec=6 (zero_return) errno=0.
230411 17:53:18 543 XrootdXeq: cms-jovy.405:26@c2427.shor.hcc disc 0:00:04

I suspect that listing directory contents will be painfully slow if the request doesn't go directly to the target server/cluster. Otherwise, I'm guessing it will result in a search of the entire hierarchy.

oshadura commented 11 months ago

Update, the second query now end up showing too many levels of symbolic links error:

cms-jovyan@jupyter-oksana-2eshadura-40cern-2ech:~$ xrdfs root://xcache// ls /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000/
[ERROR] Server responded with an error: [3005] Unable to open directory /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000/; too many levels of symbolic links