ganga-devs / ganga

Ganga is an easy-to-use frontend for job definition and management
GNU General Public License v3.0
100 stars 159 forks source link

xrootd Auth failed in singularity on Dirac job #1672

Closed asnaylor closed 4 years ago

asnaylor commented 4 years ago

Accessing root grid file through xrootd failed on a dirac job when using singularity virtualisation. Now singularity is not expected to be found in $PATH on any dirac site had to use ganga pr #1670 to use singularity binary from cvmfs. However, i get an authentication failure when i try to access the root file with xrootd. Here's the simple shell script i'm running:

source /cvmfs/sft.cern.ch/lcg/views/LCG_95apython3/x86_64-centos7-gcc8-opt/setup.sh
python -c "import ROOT; ROOT.TFile.Open('root://gfe02.grid.hep.ph.ic.ac.uk/pnfs/hep.ph.ic.ac.uk/data/lz/lz/data/MDC3/calibration/LZAP-4.3.1/20180201/lz_20180201234_lzap.root').Get('Events')"

This is the ganga python job:

    j = ganga.Job()
    j.application=Executable(exe=File('worker_node_script.sh'), args=[])
    j.virtualization = Singularity("docker://luxzeplin/offline_hosted:centos7_2")
    j.virtualization.binary='/cvmfs/oasis.opensciencegrid.org/mis/singularity/current/bin/singularity'
    j.virtualization.mounts = {'/cvmfs':'/cvmfs', '/srv':'/srv'}
    j.backend = Dirac()
    j.submit()

This is the error message:

Error in <TNetXNGFile::Open>: [FATAL] Auth failed
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ReferenceError: attempt to access a null-pointer

Here are the x509 envs for the dirac singularity:

printenv | grep SINGULARITY
SINGULARITY_APPNAME=
SINGULARITY_NAME=singularity_sandbox
SINGULARITY_CONTAINER=/tmp/25355128/singularity_sandbox
printenv | grep X509
X509_CERT_DIR=/etc/grid-security/certificates
X509_VOMS_DIR=/tmp/etc/grid-security/vomsdir
X509_USER_PROXY=/tmp/proxy

Within the container you can access X509_VOMS_DIR, X509_USER_PROXY but not X509_CERT_DIR.

When re-running and setting export XRD_LOGLEVEL="Debug" here is the log:

[2020-05-16 16:53:22.355183 +0000][Warning][Utility           ] Unable to process global config file: [ERROR] OS Error: No such file or directory
[2020-05-16 16:53:22.355298 +0000][Debug  ][Utility           ] Unable to process user config file: [ERROR] OS Error: No such file or directory
[2020-05-16 16:53:22.358257 +0000][Debug  ][PlugInMgr         ] Initializing plug-in manager...
[2020-05-16 16:53:22.358284 +0000][Debug  ][PlugInMgr         ] No default plug-in, loading plug-in configs...
[2020-05-16 16:53:22.358293 +0000][Debug  ][PlugInMgr         ] Processing plug-in definitions in /etc/xrootd/client.plugins.d...
[2020-05-16 16:53:22.358311 +0000][Debug  ][PlugInMgr         ] Unable to process directory /etc/xrootd/client.plugins.d: [ERROR] OS Error: No such file or directory
[2020-05-16 16:53:22.358353 +0000][Debug  ][PlugInMgr         ] Processing plug-in definitions in /mnt/dirac/.xrootd/client.plugins.d...
[2020-05-16 16:53:22.358365 +0000][Debug  ][PlugInMgr         ] Unable to process directory /mnt/dirac/.xrootd/client.plugins.d: [ERROR] OS Error: No such file or directory
[2020-05-16 16:53:22.492607 +0000][Debug  ][Poller            ] Available pollers: built-in
[2020-05-16 16:53:22.492897 +0000][Debug  ][Poller            ] Attempting to create a poller according to preference: built-in
[2020-05-16 16:53:22.492905 +0000][Debug  ][Poller            ] Creating poller: built-in
[2020-05-16 16:53:22.492923 +0000][Debug  ][Poller            ] Creating and starting the built-in poller...
[2020-05-16 16:53:22.493389 +0000][Debug  ][Poller            ] Using 1 poller threads
[2020-05-16 16:53:22.493407 +0000][Debug  ][TaskMgr           ] Starting the task manager...
[2020-05-16 16:53:22.493441 +0000][Debug  ][TaskMgr           ] Task manager started
[2020-05-16 16:53:22.493449 +0000][Debug  ][JobMgr            ] Starting the job manager...
[2020-05-16 16:53:22.493526 +0000][Debug  ][JobMgr            ] Job manager started, 3 workers
[2020-05-16 16:53:22.493540 +0000][Debug  ][TaskMgr           ] Registering task: "FileTimer task" to be run at: [2020-05-16 16:53:22 +0000]
[2020-05-16 16:53:22.493617 +0000][Debug  ][Utility           ] Env: overriding entry: MultiProtocol=0 with 1
[2020-05-16 16:53:22.493735 +0000][Debug  ][File              ] [0x3cd9980@root://gfe02.grid.hep.ph.ic.ac.uk:1094/pnfs/hep.ph.ic.ac.uk/data/lz/lz/data/MDC3/calibration/LZAP-4.3.1/20180201/lz_20180201234_lzap.root] Sending an open command
[2020-05-16 16:53:22.493822 +0000][Debug  ][PostMaster        ] Creating new channel to: gfe02.grid.hep.ph.ic.ac.uk:1094 1 stream(s)
[2020-05-16 16:53:22.493849 +0000][Debug  ][PostMaster        ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0] Stream parameters: Network Stack: IPAuto, Connection Window: 120, ConnectionRetry: 5, Stream Error Widnow: 1800
[2020-05-16 16:53:22.498436 +0000][Debug  ][TaskMgr           ] Registering task: "TickGeneratorTask for: gfe02.grid.hep.ph.ic.ac.uk:1094" to be run at: [2020-05-16 16:53:37 +0000]
[2020-05-16 16:53:22.500402 +0000][Debug  ][PostMaster        ] [gfe02.grid.hep.ph.ic.ac.uk:1094] Found 1 address(es): [::ffff:146.179.232.84]:1094
[2020-05-16 16:53:22.500457 +0000][Debug  ][AsyncSock         ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Attempting connection to [::ffff:146.179.232.84]:1094
[2020-05-16 16:53:22.500738 +0000][Debug  ][Poller            ] Adding socket 0x4a0aad0 to the poller
[2020-05-16 16:53:22.503441 +0000][Debug  ][AsyncSock         ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Async connection call returned
[2020-05-16 16:53:22.503494 +0000][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Sending out the initial hand shake + kXR_protocol
[2020-05-16 16:53:22.503987 +0000][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Got the server hand shake response (type: manager [], protocol version 400)
[2020-05-16 16:53:22.504594 +0000][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] kXR_protocol successful (type: server [], protocol version 400)
[2020-05-16 16:53:22.508272 +0000][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Sending out kXR_login request, username: dirac, cgi: ?xrd.cc=us&xrd.tz=0&xrd.appname=python3.6&xrd.info=&xrd.hostname=dirac-26a1e6ef.novalocal&xrd.rn=v4.8.4, dual-stack: true, private IPv4: true, private IPv6: true
[2020-05-16 16:53:22.534402 +0000][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Logged in, session: 7e78f76467065d93ac52088c47155129
[2020-05-16 16:53:22.534417 +0000][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Authentication is required: &P=gsi,v:10400,c:ssl,ca:ffc3d59b
[2020-05-16 16:53:22.534424 +0000][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Sending authentication data
200516 16:53:22 447 secgsi_Init: Secgsi: ErrError: CA directory non existing:: /etc/grid-security/certificates
200516 16:53:22 447 secgsi_Init: Secgsi: ErrError: CRL directory non existing:: /etc/grid-security/certificates
[2020-05-16 16:53:22.636886 +0000][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Trying to authenticate using gsi
[2020-05-16 16:53:22.652224 +0000][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Cannot get credentials for protocol gsi: Secgsi: ErrParseBuffer: unknown CA: cannot verify server certificate: kXGS_init
[2020-05-16 16:53:22.652246 +0000][Error  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] No protocols left to try
[2020-05-16 16:53:22.652260 +0000][Error  ][AsyncSock         ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Socket error while handshaking: [FATAL] Auth failed
[2020-05-16 16:53:22.652269 +0000][Debug  ][AsyncSock         ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Closing the socket
[2020-05-16 16:53:22.652281 +0000][Debug  ][Poller            ] <[::ffff:172.16.1.164]:54048><--><[::ffff:146.179.232.84]:1094> Removing socket from the poller
[2020-05-16 16:53:22.652325 +0000][Error  ][PostMaster        ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0] elapsed = 0, pConnectionWindow = 120 seconds.
[2020-05-16 16:53:22.652333 +0000][Error  ][PostMaster        ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0] Unable to recover: [FATAL] Auth failed.
[2020-05-16 16:53:22.652346 +0000][Error  ][XRootD            ] [gfe02.grid.hep.ph.ic.ac.uk:1094] Impossible to send message kXR_open (file: pnfs/hep.ph.ic.ac.uk/data/lz/lz/data/MDC3/calibration/LZAP-4.3.1/20180201/lz_20180201234_lzap.root, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ). Trying to recover.
[2020-05-16 16:53:22.652356 +0000][Debug  ][XRootD            ] [gfe02.grid.hep.ph.ic.ac.uk:1094] Handling error while processing kXR_open (file: pnfs/hep.ph.ic.ac.uk/data/lz/lz/data/MDC3/calibration/LZAP-4.3.1/20180201/lz_20180201234_lzap.root, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ): [FATAL] Auth failed.
[2020-05-16 16:53:22.655048 +0000][Debug  ][File              ] [0x3cd9980@root://gfe02.grid.hep.ph.ic.ac.uk:1094/pnfs/hep.ph.ic.ac.uk/data/lz/lz/data/MDC3/calibration/LZAP-4.3.1/20180201/lz_20180201234_lzap.root] Open has returned with status [FATAL] Auth failed
[2020-05-16 16:53:22.655062 +0000][Debug  ][File              ] [0x3cd9980@root://gfe02.grid.hep.ph.ic.ac.uk:1094/pnfs/hep.ph.ic.ac.uk/data/lz/lz/data/MDC3/calibration/LZAP-4.3.1/20180201/lz_20180201234_lzap.root] Error while opening at gfe02.grid.hep.ph.ic.ac.uk:1094: [FATAL] Auth failed
[2020-05-16 16:53:22.655085 +0000][Debug  ][Utility           ] Monitor library name not set. No monitoring
Error in <TNetXNGFile::Open>: [FATAL] Auth failed
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ReferenceError: attempt to access a null-pointer
[2020-05-16 16:53:22.669386 +0000][Debug  ][JobMgr            ] Stopping the job manager...
[2020-05-16 16:53:22.670221 +0000][Debug  ][JobMgr            ] Job manager stopped
[2020-05-16 16:53:22.670242 +0000][Debug  ][TaskMgr           ] Stopping the task manager...
[2020-05-16 16:53:22.670376 +0000][Debug  ][TaskMgr           ] Task manager stopped
[2020-05-16 16:53:22.670387 +0000][Debug  ][Poller            ] Stopping the poller...
[2020-05-16 16:53:22.670943 +0000][Debug  ][TaskMgr           ] Requesting unregistration of: "TickGeneratorTask for: gfe02.grid.hep.ph.ic.ac.uk:1094"
[2020-05-16 16:53:22.670962 +0000][Debug  ][AsyncSock         ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Closing the socket
[2020-05-16 16:53:22.670969 +0000][Debug  ][PostMaster        ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0] Destroying stream
[2020-05-16 16:53:22.670978 +0000][Debug  ][AsyncSock         ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Closing the socket

Similar issue to #1668

egede commented 4 years ago

There are few different options for solving this. Either of the options below should work.

If the first options works, we can fix the wrapper such that you in the j.virtualization.mounts can put environment variables that are evaluated on the worker node.

asnaylor commented 4 years ago

I did both options and the job was successful in both cases. I was able to use xrootd in python to access the file on the grid. However the status of both jobs is failed. When i looked at the logging info the JobWrapper is on status Done with minor status is Execution Complete but JobAgent@CLOUD.UKI-LT2-IC-HEP-lz is on status Failed with minor status Singularity CE Error: Command failed with exit code 2 but when i look at the logs i don't see any errors.

egede commented 4 years ago

@alexanderrichards Can you take a look at these jobs in the WMS interface at Imperial (or let me know the URL). I do not quite understand what is going on here. @asnaylor Is the job reported as 'completed' or 'failed' in Ganga? I am not sure that you care about the Job Agent but I might misunderstand what you say here.

egede commented 4 years ago

@asnaylor See now that you state job as failed. Do you understand what it is that gives the error code? Is it your script that has a non-zero error code (that singularity then propagate?).

egede commented 4 years ago

I can now replicate this on CLOUD.UK-CAM-CUMULUS-backfill.uk. I wonder if it is caused by a Singularity container running inside another singularity container? As the job executes OK, it is related to the clean-up afterwards.

alexanderrichards commented 4 years ago

@alexanderrichards Can you take a look at these jobs in the WMS interface at Imperial (or let me know the URL). I do not quite understand what is going on here.

Do you still need me to do this if you understand the issue and have a fix? If so is there a specific job id to look for or just anything from @asnaylor

egede commented 4 years ago

I believe it is all sorted.

asnaylor commented 4 years ago

I pulled the latest PR and I tried the job again running at LCG.UKI-LT2-IC-HEP.uk but this time it failed for a different reason this time:

[2020-05-20 17:27:14.761227 +0100][Debug  ][Utility           ] Unable to find user home directory.
[2020-05-20 17:27:14.761803 +0100][Debug  ][PlugInMgr         ] Initializing plug-in manager...
[2020-05-20 17:27:14.761907 +0100][Debug  ][PlugInMgr         ] No default plug-in, loading plug-in configs...
[2020-05-20 17:27:14.762003 +0100][Debug  ][PlugInMgr         ] Processing plug-in definitions in /etc/xrootd/client.plugins.d...
[2020-05-20 17:27:15.009549 +0100][Debug  ][Poller            ] Available pollers: built-in
[2020-05-20 17:27:15.009861 +0100][Debug  ][Poller            ] Attempting to create a poller according to preference: built-in
[2020-05-20 17:27:15.009961 +0100][Debug  ][Poller            ] Creating poller: built-in
[2020-05-20 17:27:15.010067 +0100][Debug  ][Poller            ] Creating and starting the built-in poller...
[2020-05-20 17:27:15.010444 +0100][Debug  ][Poller            ] Using 1 poller threads
[2020-05-20 17:27:15.010469 +0100][Debug  ][TaskMgr           ] Starting the task manager...
[2020-05-20 17:27:15.010518 +0100][Debug  ][TaskMgr           ] Task manager started
[2020-05-20 17:27:15.010542 +0100][Debug  ][JobMgr            ] Starting the job manager...
[2020-05-20 17:27:15.010646 +0100][Debug  ][JobMgr            ] Job manager started, 3 workers
[2020-05-20 17:27:15.010681 +0100][Debug  ][TaskMgr           ] Registering task: "FileTimer task" to be run at: [2020-05-20 17:27:15 +0100]
[2020-05-20 17:27:15.010804 +0100][Debug  ][Utility           ] Env: overriding entry: MultiProtocol=0 with 1
[2020-05-20 17:27:15.011181 +0100][Debug  ][File              ] [0x525f5f0@root://gfe02.grid.hep.ph.ic.ac.uk:1094/pnfs/hep.ph.ic.ac.uk/data/lz/lz/data/MDC3/calibration/LZAP-4.3.1/20180201/lz_20180201234_lzap.root] Sending an open command
[2020-05-20 17:27:15.011338 +0100][Debug  ][PostMaster        ] Creating new channel to: gfe02.grid.hep.ph.ic.ac.uk:1094 1 stream(s)
[2020-05-20 17:27:15.011398 +0100][Debug  ][PostMaster        ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0] Stream parameters: Network Stack: IPAuto, Connection Window: 120, ConnectionRetry: 5, Stream Error Widnow: 1800
[2020-05-20 17:27:15.012202 +0100][Debug  ][TaskMgr           ] Registering task: "TickGeneratorTask for: gfe02.grid.hep.ph.ic.ac.uk:1094" to be run at: [2020-05-20 17:27:30 +0100]
[2020-05-20 17:27:15.013435 +0100][Debug  ][PostMaster        ] [gfe02.grid.hep.ph.ic.ac.uk:1094] Found 2 address(es): [::ffff:146.179.232.84]:1094, [2a0c:5bc0:c8:2:a236:9fff:feed:7228]:1094
[2020-05-20 17:27:15.013499 +0100][Debug  ][AsyncSock         ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Attempting connection to [2a0c:5bc0:c8:2:a236:9fff:feed:7228]:1094
[2020-05-20 17:27:15.013573 +0100][Debug  ][Poller            ] Adding socket 0x5260050 to the poller
[2020-05-20 17:27:15.013799 +0100][Debug  ][AsyncSock         ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Async connection call returned
[2020-05-20 17:27:15.013939 +0100][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Sending out the initial hand shake + kXR_protocol
[2020-05-20 17:27:15.014320 +0100][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Got the server hand shake response (type: manager [], protocol version 400)
[2020-05-20 17:27:15.014463 +0100][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] kXR_protocol successful (type: server [], protocol version 400)
[2020-05-20 17:27:15.017543 +0100][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Sending out kXR_login request, username: ????, cgi: ?xrd.cc=uk&xrd.tz=0&xrd.appname=python3.6&xrd.info=&xrd.hostname=wg45.grid.hep.ph.ic.ac.uk&xrd.rn=v4.8.4, dual-stack: true, private IPv4: false, private IPv6: false
[2020-05-20 17:27:15.018087 +0100][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Logged in, session: baf2fbf8ec4ebdef112fd1c3113555d6
[2020-05-20 17:27:15.018175 +0100][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Authentication is required: &P=gsi,v:10400,c:ssl,ca:ffc3d59b
[2020-05-20 17:27:15.018269 +0100][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Sending authentication data
[2020-05-20 17:27:15.119615 +0100][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Trying to authenticate using gsi
[2020-05-20 17:27:15.185929 +0100][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Sending more authentication data for gsi
[2020-05-20 17:27:15.220114 +0100][Debug  ][XRootDTransport   ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Authenticated with gsi.
[2020-05-20 17:27:15.220310 +0100][Debug  ][PostMaster        ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0] Stream 0 connected.
[2020-05-20 17:27:15.220418 +0100][Debug  ][Utility           ] Monitor library name not set. No monitoring
[2020-05-20 17:27:15.273009 +0100][Debug  ][PostMaster        ] Creating new channel to: [2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 1 stream(s)
[2020-05-20 17:27:15.273066 +0100][Debug  ][PostMaster        ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0] Stream parameters: Network Stack: IPAuto, Connection Window: 120, ConnectionRetry: 5, Stream Error Widnow: 1800
[2020-05-20 17:27:15.274008 +0100][Debug  ][TaskMgr           ] Registering task: "TickGeneratorTask for: [2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718" to be run at: [2020-05-20 17:27:30 +0100]
[2020-05-20 17:27:15.274166 +0100][Debug  ][PostMaster        ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718] Found 1 address(es): [2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718
[2020-05-20 17:27:15.274213 +0100][Debug  ][AsyncSock         ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] Attempting connection to [2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718
[2020-05-20 17:27:15.274387 +0100][Debug  ][Poller            ] Adding socket 0x840018f0 to the poller
[2020-05-20 17:27:15.274686 +0100][Debug  ][AsyncSock         ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] Async connection call returned
[2020-05-20 17:27:15.274786 +0100][Debug  ][XRootDTransport   ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] Sending out the initial hand shake + kXR_protocol
[2020-05-20 17:27:15.276971 +0100][Debug  ][XRootDTransport   ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] Got the server hand shake response (type: server [], protocol version 400)
[2020-05-20 17:27:15.277103 +0100][Debug  ][XRootDTransport   ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] kXR_protocol successful (type: server [], protocol version 400)
[2020-05-20 17:27:15.278755 +0100][Debug  ][XRootDTransport   ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] Sending out kXR_login request, username: ????, cgi: ?xrd.cc=uk&xrd.tz=0&xrd.appname=python3.6&xrd.info=&xrd.hostname=wg45.grid.hep.ph.ic.ac.uk&xrd.rn=v4.8.4, dual-stack: true, private IPv4: false, private IPv6: false
[2020-05-20 17:27:15.279225 +0100][Debug  ][XRootDTransport   ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] Logged in, session: bebdeec19a84defa728553f50332881e
[2020-05-20 17:27:15.279313 +0100][Debug  ][PostMaster        ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0] Stream 0 connected.
[2020-05-20 17:27:15.280759 +0100][Debug  ][XRootD            ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718] Handling error while processing kXR_open (file: pnfs/hep.ph.ic.ac.uk/data/lz/lz/data/MDC3/calibration/LZAP-4.3.1/20180201/lz_20180201234_lzap.root?org.dcache.xrootd.client=, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ): [ERROR] Error response: Permission denied.
[2020-05-20 17:27:15.280977 +0100][Debug  ][File              ] [0x525f5f0@root://gfe02.grid.hep.ph.ic.ac.uk:1094/pnfs/hep.ph.ic.ac.uk/data/lz/lz/data/MDC3/calibration/LZAP-4.3.1/20180201/lz_20180201234_lzap.root] Open has returned with status [ERROR] Server responded with an error: [3010] Request lacks the org.dcache.uuid property.
[2020-05-20 17:27:15.280997 +0100][Debug  ][File              ] [0x525f5f0@root://gfe02.grid.hep.ph.ic.ac.uk:1094/pnfs/hep.ph.ic.ac.uk/data/lz/lz/data/MDC3/calibration/LZAP-4.3.1/20180201/lz_20180201234_lzap.root] Error while opening at [2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718: [ERROR] Server responded with an error: [3010] Request lacks the org.dcache.uuid property.
Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Request lacks the org.dcache.uuid property.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
ReferenceError: attempt to access a null-pointer
[2020-05-20 17:27:15.306374 +0100][Debug  ][JobMgr            ] Stopping the job manager...
[2020-05-20 17:27:15.307050 +0100][Debug  ][JobMgr            ] Job manager stopped
[2020-05-20 17:27:15.307147 +0100][Debug  ][TaskMgr           ] Stopping the task manager...
[2020-05-20 17:27:15.307295 +0100][Debug  ][TaskMgr           ] Task manager stopped
[2020-05-20 17:27:15.307374 +0100][Debug  ][Poller            ] Stopping the poller...
[2020-05-20 17:27:15.307553 +0100][Debug  ][TaskMgr           ] Requesting unregistration of: "TickGeneratorTask for: [2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718"
[2020-05-20 17:27:15.307649 +0100][Debug  ][AsyncSock         ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] Closing the socket
[2020-05-20 17:27:15.307741 +0100][Debug  ][Poller            ] <[2a0c:5bc0:c8:2:d6ae:52ff:fe6a:ab5]:53010><--><[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718> Removing socket from the poller
[2020-05-20 17:27:15.307885 +0100][Debug  ][PostMaster        ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0] Destroying stream
[2020-05-20 17:27:15.307977 +0100][Debug  ][AsyncSock         ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] Closing the socket
[2020-05-20 17:27:15.308071 +0100][Debug  ][TaskMgr           ] Requesting unregistration of: "TickGeneratorTask for: gfe02.grid.hep.ph.ic.ac.uk:1094"
[2020-05-20 17:27:15.308146 +0100][Debug  ][AsyncSock         ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Closing the socket
[2020-05-20 17:27:15.308218 +0100][Debug  ][Poller            ] <[2a0c:5bc0:c8:2:d6ae:52ff:fe6a:ab5]:56958><--><[2a0c:5bc0:c8:2:a236:9fff:feed:7228]:1094> Removing socket from the poller
[2020-05-20 17:27:15.308313 +0100][Debug  ][PostMaster        ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0] Destroying stream
[2020-05-20 17:27:15.308388 +0100][Debug  ][AsyncSock         ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Closing the socket
egede commented 4 years ago

So it works at some sites but not other sites? I suspect that it is your mounting of /srv

j.virtualization.mounts = {'/cvmfs':'/cvmfs', '/srv':'/srv'}

(which is a bit of a hack) that doesn't work everywhere. Can you remember what it was that we tried to fix with that change?

asnaylor commented 4 years ago

At the moment i am mounting:

j.virtualization.mounts = {'/cvmfs':'/cvmfs', '/srv':'/srv', '/etc':'/etc', '/scratch':'/scratch'}

I was mounting /srv to allow the container access to X509_VOMS_DIR and X509_USER_PROXY.

asnaylor commented 4 years ago

is there a way to mount just those explicit folder to the container using the variables (X509_CERT_DIR, X509_VOMS_DIR and X509_USER_PROXY)?

asnaylor commented 4 years ago

I ran the same job on a couple of different dirac sites and collated the results. A lot of the jobs were successful but it seems like there is a two failure modes; the first is just mounting the correct X509 folders for the job and second is that Request lacks the org.dcache.uuid property.

site Status Errors X509_CERT_DIR X509_VOMS_DIR X509_USER_PROXY runtime (s)
CLOUD.RAL-LCG2.uk Done - /etc/grid-security/certificates /scratch/plt/etc/grid-security/vomsdir /tmp/x509up_u10000 47
CLOUD.UKI-LT2-IC-HEP-lz.uk Done - /etc/grid-security/certificates /tmp/etc/grid-security/vomsdir /tmp/proxy 9
LCG.UKI-LT2-Brunel.uk Done - /etc/grid-security/certificates /scratch/dir_16058/MwmNDmjwzvwnJmrZMnYHaOWq7uhkjmABFKDmeYMLDmABFKDmDyuIXm/DIRAC_jDeylkpilot/etc/grid-security/vomsdir /scratch/dir_16058/tmpA7CKKH 39
LCG.UKI-LT2-IC-HEP.uk Failed Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Request lacks the org.dcache.uuid property /cvmfs/grid.cern.ch/etc/grid-security/certificates /srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/etc/grid-security/vomsdir /srv/localstage/condor/dir_7410/tmpftzyFy -
LCG.UKI-LT2-QMUL.uk Failed Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Request lacks the org.dcache.uuid property /scratch/tmp//khRNDm0yzvwn3JPVEm4QteWmABFKDmABFKDmH9FKDmABFKDmrQjjGn/arc/certificates /scratch/tmp/khRNDm0yzvwn3JPVEm4QteWmABFKDmABFKDmH9FKDmABFKDmrQjjGn/DIRAC_xK1DI7pilot/etc/grid-security/vomsdir /scratch/lcg/pillz09/6122620/tmpA9Wbse -
LCG.UKI-NORTHGRID-LANCS-HEP.uk Failed Error in <TNetXNGFile::Open>: [FATAL] Auth failed : ls: cannot access /opt/gridapps/etc/grid-security/certificates: No such file or directory : ls: cannot access /home/iris/pltlz006/home_cream_675579780/CREAM675579780/DIRAC_YExrYJpilot/etc/grid-security/vomsdir: No such file or directory /opt/gridapps/etc/grid-security/certificates /home/iris/pltlz006/home_cream_675579780/CREAM675579780/DIRAC_YExrYJpilot/etc/grid-security/vomsdir /home/data/tmp/3608722.1.grid7/tmpip05BS -
LCG.UKI-NORTHGRID-LIV-HEP.uk Failed Error in <TNetXNGFile::Open>: [FATAL] Auth failed : ls: cannot access /data/condor_pool/dir_7075/DIRAC_VDOB_4pilot/etc/grid-security/vomsdir: No such file or directory : ls: cannot access /data/condor_pool/dir_7075/tmpfa8ZlR: No such file or directory /etc/grid-security/certificates /data/condor_pool/dir_7075/DIRAC_VDOB_4pilot/etc/grid-security/vomsdir /data/condor_pool/dir_7075/tmpfa8ZlR -
LCG.UKI-NORTHGRID-MAN-HEP.uk Done - /etc/grid-security/certificates /scratch/condor_pool/condor/dir_26932/CFBNDmvyzvwnOkaSmpEpAjQq5wXwEmABFKDmtOwTDmABFKDmre8JPo/DIRAC_fDllEvpilot/etc/grid-security/vomsdir /scratch/condor_pool/condor/dir_26932/tmpAoPSbv 25
LCG.UKI-SCOTGRID-ECDF.uk Failed Error in <TNetXNGFile::Open>: [FATAL] Auth failed : ls: cannot access /local/2127066.1.eddie/pCTNDmxyzvwntvq09p9vnX1nABFKDmABFKDmfWJKDmABFKDmDSeqln/DIRAC_UcGokIpilot/etc/grid-security/vomsdir: No such file or directory : ls: cannot access /local/2127066.1.eddie/tmpMYAb06: No such file or directory /etc/grid-security/certificates /local/2127066.1.eddie/pCTNDmxyzvwntvq09p9vnX1nABFKDmABFKDmfWJKDmABFKDmDSeqln/DIRAC_UcGokIpilot/etc/grid-security/vomsdir /local/2127066.1.eddie/tmpMYAb06 -
LCG.UKI-SOUTHGRID-OX-HEP.uk Failed Error in <TNetXNGFile::Open>: [FATAL] Auth failed : ls: cannot access /home/pool/condor/dir_205354/GSdKDmoyzvwnjGWBFmwldEhq1cyeCnABFKDmI2GODmABFKDmlB9W0m/DIRAC_MhPfmlpilot/etc/grid-security/vomsdir: No such file or directory : ls: cannot access /home/pool/condor/dir_205354/tmphLB0zu: No such file or directory /etc/grid-security/certificates /home/pool/condor/dir_205354/GSdKDmoyzvwnjGWBFmwldEhq1cyeCnABFKDmI2GODmABFKDmlB9W0m/DIRAC_MhPfmlpilot/etc/grid-security/vomsdir /home/pool/condor/dir_205354/tmphLB0zu -
LCG.UKI-SOUTHGRID-RALPP.uk Failed Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Request lacks the org.dcache.uuid property /scratch/condor/dir_213526/p0sNDmpyzvwnOOVDjqUTj3jq6xrg1pABFKDmJ14SDmABFKDmAcPTmm/arc/certificates /scratch/condor/dir_213526/p0sNDmpyzvwnOOVDjqUTj3jq6xrg1pABFKDmJ14SDmABFKDmAcPTmm/DIRAC__uJfwppilot/etc/grid-security/vomsdir /scratch/condor/dir_213526/tmpG8O_cv -
VAC.UKI-NORTHGRID-MAN-HEP.uk Done - /etc/grid-security/certificates /scratch/plt/etc/grid-security/vomsdir /tmp/x509up_u10000 65
VAC.UKI-SCOTGRID-GLASGOW.uk Done - /etc/grid-security/certificates /scratch/plt/etc/grid-security/vomsdir /tmp/x509up_u10000 20
egede commented 4 years ago

I will try to implement a pre-ample to starting the Singularity container. This will allow for some python code to be executed beforehand. I think that is better than just allowing for environment variables in the mounts (which will fail for X509_USER_PROXY as it is a file and not a directory).

egede commented 4 years ago

The issue of the org.dcache.uuid property, I still do not know what is going on there. It is obviously not related to the singularity container itself (then it would fail everywhere). can you try to do a printenv inside the job. We can then compare this for a working and non-working site and try to work out where there might be a difference.

asnaylor commented 4 years ago

Here are the printenv A successful job with no errors at CLOUD.RAL-LCG2.uk

DIRAC=/scratch/plt
XDG_SESSION_ID=c1
HOSTNAME=vcycle-gds-vm-lz-s9mexdtjmu
DIRAC_PROCESSORS=1
SHELL=/bin/bash
TERM=unknown
GFAL_PLUGIN_DIR=/scratch/plt/Linux_x86_64_glibc-2.17/lib/gfal2-plugins
HISTSIZE=1000
DIRACPYTHON=/scratch/plt/Linux_x86_64_glibc-2.17/bin/python2.7
PYTHONUNBUFFERED=yes
JOBID=25409988
QTDIR=/usr/lib64/qt-3.3
SINGULARITY_APPNAME=
X509_CERT_DIR=/etc/grid-security/certificates
DIRAC_WHOLENODE=False
QTINC=/usr/lib64/qt-3.3/include
DIRACLIB=/scratch/plt/Linux_x86_64_glibc-2.17/lib
LC_ALL=en_US.UTF-8
QT_GRAPHICSSYSTEM_CHECKED=1
PILOT_UUID=vm://vcycle-ral.blackett.manchester.ac.uk/vcycle-ral.blackett.manchester.ac.uk:1590066986.vcycle-gds-vm-lz-s9mexdtjmu:gds-vm-lz
USER=plt00p00
USER_PATH=/scratch/plt/25409988:/scratch/plt/Linux_x86_64_glibc-2.17/bin:/scratch/plt/Linux_x86_64_glibc-2.17/bin:/scratch/plt/scripts:/scratch/plt/Linux_x86_64_glibc-2.17/bin:/usr/lib64/qt-3.3/bin:/opt/google-cloud-sdk/bin:/usr/lib/ec2/bin:/usr/lib64/ccache:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin
DIRACSYSCONFIG=/scratch/plt/pilot.cfg
LD_LIBRARY_PATH=/.singularity.d/libs
SUDO_USER=plt
SUDO_UID=1000
EC2_HOME=/usr/lib/ec2
SINGULARITY_NAME=singularity_sandbox
DIRACROOT=/scratch/plt
USERNAME=plt00p00
GLOBUS_IO_IPV6=TRUE
MAIL=/var/spool/mail/plt
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
CERNVM_ENV=1
CONDOR_CONFIG=/etc/condor/condor_config
PWD=/scratch/plt/25409988
JAVA_HOME=/usr
PYTHONOPTIMIZE=x
JOBFEATURES=https://vm85.blackett.manchester.ac.uk:443/machines/vcycle-ral.blackett.manchester.ac.uk/vcycle-gds-vm-lz-s9mexdtjmu/jobfeatures
LANG=C
MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
LOADEDMODULES=
DCOMMANDS_PPID=6517
X509_VOMS_DIR=/scratch/plt/etc/grid-security/vomsdir
QT_GRAPHICSSYSTEM=native
DIRACSCRIPTS=/scratch/plt/scripts
DIRACSITE=CLOUD.RAL-LCG2.uk
HISTCONTROL=ignoredups
SSL_CERT_DIR=/etc/grid-security/certificates
SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
SHLVL=11
SUDO_COMMAND=/bin/sh -c /scratch/plt/job/Wrapper/Job25409988
DIRACJOBID=25409988
HOME=/scratch/plt00p00
MACHINEFEATURES=https://vm85.blackett.manchester.ac.uk:443/machines/vcycle-ral.blackett.manchester.ac.uk/vcycle-gds-vm-lz-s9mexdtjmu/machinefeatures
LANGUAGE=en_US.UTF-8
X509_USER_PROXY=/tmp/x509up_u10000
OPENSSL_CONF=/tmp
ARC_PLUGIN_PATH=/scratch/plt/Linux_x86_64_glibc-2.17/lib/arc
DIRACBIN=/scratch/plt/Linux_x86_64_glibc-2.17/bin
DYLD_LIBRARY_PATH=/scratch/plt/Linux_x86_64_glibc-2.17/lib:/scratch/plt/Linux_x86_64_glibc-2.17/lib:/scratch/plt/Linux_x86_64_glibc-2.17/lib/mysql:/scratch/plt/Linux_x86_64_glibc-2.17/lib:
GFAL_CONFIG_DIR=/scratch/plt/Linux_x86_64_glibc-2.17/etc/gfal2.d
AGENT_WORKDIRECTORY=/scratch/plt/work/WorkloadManagement/JobAgent
PYTHONPATH=/scratch/plt:/scratch/plt:/scratch/plt
JOB_ID=vcycle-ral.blackett.manchester.ac.uk:1590066986.vcycle-gds-vm-lz-s9mexdtjmu:gds-vm-lz
LOGNAME=plt00p00
CVS_RSH=ssh
QTLIB=/usr/lib64/qt-3.3/lib
XDG_DATA_DIRS=/scratch/plt/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share
MODULESHOME=/usr/share/Modules
LESSOPEN=||/usr/bin/lesspipe.sh %s
PROMPT_COMMAND=PS1="Singularity> "; unset PROMPT_COMMAND
SINGULARITY_CONTAINER=/scratch/plt/25409988/singularity_sandbox
SUDO_GID=1000
XDG_RUNTIME_DIR=/scratch/plt/25409988/.xdg
GLOBUS_FTP_CLIENT_IPV6=TRUE
JOBOUTPUTS=https://vm85.blackett.manchester.ac.uk:443/machines/vcycle-ral.blackett.manchester.ac.uk/vcycle-gds-vm-lz-s9mexdtjmu/joboutputs
RRD_DEFAULT_FONT=/scratch/plt/Linux_x86_64_glibc-2.17/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf
DIRACPLAT=Linux_x86_64_glibc-2.17
_=/usr/bin/printenv

An unsuccessful job with the org.dcache.uuid problem at LCG.UKI-LT2-IC-HEP.uk:

DIRAC=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot
_CONDOR_JOB_PIDS=
DIRAC_PROCESSORS=1
GFAL_PLUGIN_DIR=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/lib/gfal2-plugins
DIRACPYTHON=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/bin/python2.7
TMPDIR=/srv/localstage/condor/dir_7410
PYTHONUNBUFFERED=yes
JOBID=25409995
_CONDOR_SCRATCH_DIR=/srv/localstage/condor/dir_7410
SINGULARITY_APPNAME=
X509_CERT_DIR=/cvmfs/grid.cern.ch/etc/grid-security/certificates
DIRAC_WHOLENODE=False
DIRACLIB=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/lib
LC_ALL=en_US.UTF-8
_CHIRP_DELAYED_UPDATE_PREFIX=Chirp*
_CONDOR_ANCESTOR_23186=27254:1588924743:1633129938
USER_PATH=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/25409995:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/bin:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/bin:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/scripts:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin
TEMP=/srv/localstage/condor/dir_7410
LD_LIBRARY_PATH=/.singularity.d/libs
BATCH_SYSTEM=HTCondor
VO_CMS_SW_DIR=/cvmfs/cms.cern.ch
SINGULARITY_NAME=singularity_sandbox
DIRACROOT=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot
_CONDOR_CHIRP_CONFIG=/srv/localstage/condor/dir_7410/.chirp.config
CONDORCE_COLLECTOR_HOST=ceprod03.grid.hep.ph.ic.ac.uk:9619
HTCONDOR_JOBID=280235.0
GLOBUS_IO_IPV6=TRUE
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
_CONDOR_BIN=/usr/bin
PWD=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/25409995
PYTHONOPTIMIZE=x
LANG=en_US.UTF-8
DCOMMANDS_PPID=7961
X509_VOMS_DIR=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/etc/grid-security/vomsdir
DIRACSCRIPTS=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/scripts
_CONDOR_SLOT=slot1_6
DIRACSITE=LCG.UKI-LT2-IC-HEP.uk
_CONDOR_ANCESTOR_27254=7410:1590067886:1401183745
SSL_CERT_DIR=/cvmfs/grid.cern.ch/etc/grid-security/certificates
SHLVL=10
DIRACJOBID=25409995
HOME=/home/batch/job0006
_CONDOR_MACHINE_AD=/srv/localstage/condor/dir_7410/.machine.ad
TERMINFO=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/share/terminfo:/usr/share/terminfo:/etc/terminfo
LANGUAGE=en_US.UTF-8
OPENSSL_CONF=/tmp
X509_USER_PROXY=/srv/localstage/condor/dir_7410/tmpftzyFy
ARC_PLUGIN_PATH=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/lib/arc
DIRACBIN=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/bin
_CONDOR_ANCESTOR_7410=7414:1590067888:1504722386
DYLD_LIBRARY_PATH=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/lib:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/lib:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/lib/mysql:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/lib:
GFAL_CONFIG_DIR=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/etc/gfal2.d
AGENT_WORKDIRECTORY=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/work/WorkloadManagement/JobAgent
PYTHONPATH=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot
TMP=/srv/localstage/condor/dir_7410
OMP_NUM_THREADS=1
_CONDOR_JOB_AD=/srv/localstage/condor/dir_7410/.job.ad
PROMPT_COMMAND=PS1="Singularity> "; unset PROMPT_COMMAND
SINGULARITY_CONTAINER=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/25409995/singularity_sandbox
XDG_RUNTIME_DIR=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/25409995/.xdg
GLOBUS_FTP_CLIENT_IPV6=TRUE
_CONDOR_JOB_IWD=/srv/localstage/condor/dir_7410
RRD_DEFAULT_FONT=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf
DIRACPLAT=Linux_x86_64_glibc-2.17
_=/usr/bin/printenv
egede commented 4 years ago

Asking for some help ... https://github.com/xrootd/xrootd/issues/1202

egede commented 4 years ago

I eventually got a reply from xrootd support, see https://github.com/xrootd/xrootd/issues/1202#issuecomment-649527054 So it seems like this is not a problem of Ganga or Singularity. Annoying. In any case I close the issue here.