DUNE / dist-comp

Action items for DUNE distributed computing, and common scripts that are used.
2 stars 0 forks source link

AWT failing at IN_TIFR why #100

Open StevenCTimm opened 1 year ago

StevenCTimm commented 1 year ago

need to investigate.

StevenCTimm commented 1 year ago

Further investigation shows that the xrdcp command is not available, from the jobscript log: Thus all file downloads are failing All file uploads are failing too but somehow rucio upload is returning status zero.

IN_TIFR RAL_ECHO davs root://xrootd.echo.stfc.ac.uk:1094/dune:/protodune/RSE/testpro/f5/4a/awt-download-2023-03-02-01.txt ../justin-jobscript: line 28: xrdcp: command not found 'xrdcp --force --nopbar --verbose root://xrootd.echo.stfc.ac.uk:1094/dune:/protodune/RSE/testpro/f5/4a/awt-download-2023-03-02-01.txt downloaded.txt' returns 127 GFAL_CONFIG_DIR: GFAL_PLUGIN_DIR:


The upload is failing too but returning status zero. Details: Missing dependency : gfal2 'justin-rucio-upload --rse SURFSARA --protocol davs --scope testpro --dataset awt-uploads awt-1689915776-nuePg37Kks' returns 0

This is happening when communicating to all sites.

xrdcp should be inside the singularity container that this is running. There are known issues with user namespaces at IN_TIFR so that could be the root cause.