Closed asnaylor closed 4 years ago
There are few different options for solving this. Either of the options below should work.
/etc/grid-security/certificates
directory to the container.X509_CERT_DIR=/cvmfs/lhcb.cern.ch/etc/grid-security/certificates
inside the container (or some other location in /cvmfs
where the certificates are kept updated. If the first options works, we can fix the wrapper such that you in the j.virtualization.mounts
can put environment variables that are evaluated on the worker node.
I did both options and the job was successful in both cases. I was able to use xrootd in python to access the file on the grid. However the status of both jobs is failed. When i looked at the logging info the JobWrapper
is on status Done
with minor status is Execution Complete
but JobAgent@CLOUD.UKI-LT2-IC-HEP-lz
is on status Failed
with minor status Singularity CE Error: Command failed with exit code 2
but when i look at the logs i don't see any errors.
@alexanderrichards Can you take a look at these jobs in the WMS interface at Imperial (or let me know the URL). I do not quite understand what is going on here. @asnaylor Is the job reported as 'completed' or 'failed' in Ganga? I am not sure that you care about the Job Agent but I might misunderstand what you say here.
@asnaylor See now that you state job as failed. Do you understand what it is that gives the error code? Is it your script that has a non-zero error code (that singularity then propagate?).
I can now replicate this on CLOUD.UK-CAM-CUMULUS-backfill.uk
. I wonder if it is caused by a Singularity container running inside another singularity container? As the job executes OK, it is related to the clean-up afterwards.
@alexanderrichards Can you take a look at these jobs in the WMS interface at Imperial (or let me know the URL). I do not quite understand what is going on here.
Do you still need me to do this if you understand the issue and have a fix? If so is there a specific job id to look for or just anything from @asnaylor
I believe it is all sorted.
I pulled the latest PR and I tried the job again running at LCG.UKI-LT2-IC-HEP.uk
but this time it failed for a different reason this time:
[2020-05-20 17:27:14.761227 +0100][Debug ][Utility ] Unable to find user home directory.
[2020-05-20 17:27:14.761803 +0100][Debug ][PlugInMgr ] Initializing plug-in manager...
[2020-05-20 17:27:14.761907 +0100][Debug ][PlugInMgr ] No default plug-in, loading plug-in configs...
[2020-05-20 17:27:14.762003 +0100][Debug ][PlugInMgr ] Processing plug-in definitions in /etc/xrootd/client.plugins.d...
[2020-05-20 17:27:15.009549 +0100][Debug ][Poller ] Available pollers: built-in
[2020-05-20 17:27:15.009861 +0100][Debug ][Poller ] Attempting to create a poller according to preference: built-in
[2020-05-20 17:27:15.009961 +0100][Debug ][Poller ] Creating poller: built-in
[2020-05-20 17:27:15.010067 +0100][Debug ][Poller ] Creating and starting the built-in poller...
[2020-05-20 17:27:15.010444 +0100][Debug ][Poller ] Using 1 poller threads
[2020-05-20 17:27:15.010469 +0100][Debug ][TaskMgr ] Starting the task manager...
[2020-05-20 17:27:15.010518 +0100][Debug ][TaskMgr ] Task manager started
[2020-05-20 17:27:15.010542 +0100][Debug ][JobMgr ] Starting the job manager...
[2020-05-20 17:27:15.010646 +0100][Debug ][JobMgr ] Job manager started, 3 workers
[2020-05-20 17:27:15.010681 +0100][Debug ][TaskMgr ] Registering task: "FileTimer task" to be run at: [2020-05-20 17:27:15 +0100]
[2020-05-20 17:27:15.010804 +0100][Debug ][Utility ] Env: overriding entry: MultiProtocol=0 with 1
[2020-05-20 17:27:15.011181 +0100][Debug ][File ] [0x525f5f0@root://gfe02.grid.hep.ph.ic.ac.uk:1094/pnfs/hep.ph.ic.ac.uk/data/lz/lz/data/MDC3/calibration/LZAP-4.3.1/20180201/lz_20180201234_lzap.root] Sending an open command
[2020-05-20 17:27:15.011338 +0100][Debug ][PostMaster ] Creating new channel to: gfe02.grid.hep.ph.ic.ac.uk:1094 1 stream(s)
[2020-05-20 17:27:15.011398 +0100][Debug ][PostMaster ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0] Stream parameters: Network Stack: IPAuto, Connection Window: 120, ConnectionRetry: 5, Stream Error Widnow: 1800
[2020-05-20 17:27:15.012202 +0100][Debug ][TaskMgr ] Registering task: "TickGeneratorTask for: gfe02.grid.hep.ph.ic.ac.uk:1094" to be run at: [2020-05-20 17:27:30 +0100]
[2020-05-20 17:27:15.013435 +0100][Debug ][PostMaster ] [gfe02.grid.hep.ph.ic.ac.uk:1094] Found 2 address(es): [::ffff:146.179.232.84]:1094, [2a0c:5bc0:c8:2:a236:9fff:feed:7228]:1094
[2020-05-20 17:27:15.013499 +0100][Debug ][AsyncSock ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Attempting connection to [2a0c:5bc0:c8:2:a236:9fff:feed:7228]:1094
[2020-05-20 17:27:15.013573 +0100][Debug ][Poller ] Adding socket 0x5260050 to the poller
[2020-05-20 17:27:15.013799 +0100][Debug ][AsyncSock ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Async connection call returned
[2020-05-20 17:27:15.013939 +0100][Debug ][XRootDTransport ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Sending out the initial hand shake + kXR_protocol
[2020-05-20 17:27:15.014320 +0100][Debug ][XRootDTransport ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Got the server hand shake response (type: manager [], protocol version 400)
[2020-05-20 17:27:15.014463 +0100][Debug ][XRootDTransport ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] kXR_protocol successful (type: server [], protocol version 400)
[2020-05-20 17:27:15.017543 +0100][Debug ][XRootDTransport ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Sending out kXR_login request, username: ????, cgi: ?xrd.cc=uk&xrd.tz=0&xrd.appname=python3.6&xrd.info=&xrd.hostname=wg45.grid.hep.ph.ic.ac.uk&xrd.rn=v4.8.4, dual-stack: true, private IPv4: false, private IPv6: false
[2020-05-20 17:27:15.018087 +0100][Debug ][XRootDTransport ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Logged in, session: baf2fbf8ec4ebdef112fd1c3113555d6
[2020-05-20 17:27:15.018175 +0100][Debug ][XRootDTransport ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Authentication is required: &P=gsi,v:10400,c:ssl,ca:ffc3d59b
[2020-05-20 17:27:15.018269 +0100][Debug ][XRootDTransport ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Sending authentication data
[2020-05-20 17:27:15.119615 +0100][Debug ][XRootDTransport ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Trying to authenticate using gsi
[2020-05-20 17:27:15.185929 +0100][Debug ][XRootDTransport ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Sending more authentication data for gsi
[2020-05-20 17:27:15.220114 +0100][Debug ][XRootDTransport ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Authenticated with gsi.
[2020-05-20 17:27:15.220310 +0100][Debug ][PostMaster ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0] Stream 0 connected.
[2020-05-20 17:27:15.220418 +0100][Debug ][Utility ] Monitor library name not set. No monitoring
[2020-05-20 17:27:15.273009 +0100][Debug ][PostMaster ] Creating new channel to: [2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 1 stream(s)
[2020-05-20 17:27:15.273066 +0100][Debug ][PostMaster ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0] Stream parameters: Network Stack: IPAuto, Connection Window: 120, ConnectionRetry: 5, Stream Error Widnow: 1800
[2020-05-20 17:27:15.274008 +0100][Debug ][TaskMgr ] Registering task: "TickGeneratorTask for: [2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718" to be run at: [2020-05-20 17:27:30 +0100]
[2020-05-20 17:27:15.274166 +0100][Debug ][PostMaster ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718] Found 1 address(es): [2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718
[2020-05-20 17:27:15.274213 +0100][Debug ][AsyncSock ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] Attempting connection to [2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718
[2020-05-20 17:27:15.274387 +0100][Debug ][Poller ] Adding socket 0x840018f0 to the poller
[2020-05-20 17:27:15.274686 +0100][Debug ][AsyncSock ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] Async connection call returned
[2020-05-20 17:27:15.274786 +0100][Debug ][XRootDTransport ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] Sending out the initial hand shake + kXR_protocol
[2020-05-20 17:27:15.276971 +0100][Debug ][XRootDTransport ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] Got the server hand shake response (type: server [], protocol version 400)
[2020-05-20 17:27:15.277103 +0100][Debug ][XRootDTransport ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] kXR_protocol successful (type: server [], protocol version 400)
[2020-05-20 17:27:15.278755 +0100][Debug ][XRootDTransport ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] Sending out kXR_login request, username: ????, cgi: ?xrd.cc=uk&xrd.tz=0&xrd.appname=python3.6&xrd.info=&xrd.hostname=wg45.grid.hep.ph.ic.ac.uk&xrd.rn=v4.8.4, dual-stack: true, private IPv4: false, private IPv6: false
[2020-05-20 17:27:15.279225 +0100][Debug ][XRootDTransport ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] Logged in, session: bebdeec19a84defa728553f50332881e
[2020-05-20 17:27:15.279313 +0100][Debug ][PostMaster ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0] Stream 0 connected.
[2020-05-20 17:27:15.280759 +0100][Debug ][XRootD ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718] Handling error while processing kXR_open (file: pnfs/hep.ph.ic.ac.uk/data/lz/lz/data/MDC3/calibration/LZAP-4.3.1/20180201/lz_20180201234_lzap.root?org.dcache.xrootd.client=, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ): [ERROR] Error response: Permission denied.
[2020-05-20 17:27:15.280977 +0100][Debug ][File ] [0x525f5f0@root://gfe02.grid.hep.ph.ic.ac.uk:1094/pnfs/hep.ph.ic.ac.uk/data/lz/lz/data/MDC3/calibration/LZAP-4.3.1/20180201/lz_20180201234_lzap.root] Open has returned with status [ERROR] Server responded with an error: [3010] Request lacks the org.dcache.uuid property.
[2020-05-20 17:27:15.280997 +0100][Debug ][File ] [0x525f5f0@root://gfe02.grid.hep.ph.ic.ac.uk:1094/pnfs/hep.ph.ic.ac.uk/data/lz/lz/data/MDC3/calibration/LZAP-4.3.1/20180201/lz_20180201234_lzap.root] Error while opening at [2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718: [ERROR] Server responded with an error: [3010] Request lacks the org.dcache.uuid property.
Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Request lacks the org.dcache.uuid property.
Traceback (most recent call last):
File "<string>", line 1, in <module>
ReferenceError: attempt to access a null-pointer
[2020-05-20 17:27:15.306374 +0100][Debug ][JobMgr ] Stopping the job manager...
[2020-05-20 17:27:15.307050 +0100][Debug ][JobMgr ] Job manager stopped
[2020-05-20 17:27:15.307147 +0100][Debug ][TaskMgr ] Stopping the task manager...
[2020-05-20 17:27:15.307295 +0100][Debug ][TaskMgr ] Task manager stopped
[2020-05-20 17:27:15.307374 +0100][Debug ][Poller ] Stopping the poller...
[2020-05-20 17:27:15.307553 +0100][Debug ][TaskMgr ] Requesting unregistration of: "TickGeneratorTask for: [2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718"
[2020-05-20 17:27:15.307649 +0100][Debug ][AsyncSock ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] Closing the socket
[2020-05-20 17:27:15.307741 +0100][Debug ][Poller ] <[2a0c:5bc0:c8:2:d6ae:52ff:fe6a:ab5]:53010><--><[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718> Removing socket from the poller
[2020-05-20 17:27:15.307885 +0100][Debug ][PostMaster ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0] Destroying stream
[2020-05-20 17:27:15.307977 +0100][Debug ][AsyncSock ] [[2a0c:5bc0:c8:2:266e:96ff:fe14:f78]:24718 #0.0] Closing the socket
[2020-05-20 17:27:15.308071 +0100][Debug ][TaskMgr ] Requesting unregistration of: "TickGeneratorTask for: gfe02.grid.hep.ph.ic.ac.uk:1094"
[2020-05-20 17:27:15.308146 +0100][Debug ][AsyncSock ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Closing the socket
[2020-05-20 17:27:15.308218 +0100][Debug ][Poller ] <[2a0c:5bc0:c8:2:d6ae:52ff:fe6a:ab5]:56958><--><[2a0c:5bc0:c8:2:a236:9fff:feed:7228]:1094> Removing socket from the poller
[2020-05-20 17:27:15.308313 +0100][Debug ][PostMaster ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0] Destroying stream
[2020-05-20 17:27:15.308388 +0100][Debug ][AsyncSock ] [gfe02.grid.hep.ph.ic.ac.uk:1094 #0.0] Closing the socket
So it works at some sites but not other sites? I suspect that it is your mounting of /srv
j.virtualization.mounts = {'/cvmfs':'/cvmfs', '/srv':'/srv'}
(which is a bit of a hack) that doesn't work everywhere. Can you remember what it was that we tried to fix with that change?
At the moment i am mounting:
j.virtualization.mounts = {'/cvmfs':'/cvmfs', '/srv':'/srv', '/etc':'/etc', '/scratch':'/scratch'}
I was mounting /srv
to allow the container access to X509_VOMS_DIR
and X509_USER_PROXY
.
is there a way to mount just those explicit folder to the container using the variables (X509_CERT_DIR
, X509_VOMS_DIR
and X509_USER_PROXY
)?
I ran the same job on a couple of different dirac sites and collated the results. A lot of the jobs were successful but it seems like there is a two failure modes; the first is just mounting the correct X509 folders for the job and second is that Request lacks the org.dcache.uuid property
.
site | Status | Errors | X509_CERT_DIR |
X509_VOMS_DIR |
X509_USER_PROXY |
runtime (s) |
---|---|---|---|---|---|---|
CLOUD.RAL-LCG2.uk |
Done | - | /etc/grid-security/certificates |
/scratch/plt/etc/grid-security/vomsdir |
/tmp/x509up_u10000 |
47 |
CLOUD.UKI-LT2-IC-HEP-lz.uk |
Done | - | /etc/grid-security/certificates |
/tmp/etc/grid-security/vomsdir |
/tmp/proxy |
9 |
LCG.UKI-LT2-Brunel.uk |
Done | - | /etc/grid-security/certificates |
/scratch/dir_16058/MwmNDmjwzvwnJmrZMnYHaOWq7uhkjmABFKDmeYMLDmABFKDmDyuIXm/DIRAC_jDeylkpilot/etc/grid-security/vomsdir |
/scratch/dir_16058/tmpA7CKKH |
39 |
LCG.UKI-LT2-IC-HEP.uk |
Failed | Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Request lacks the org.dcache.uuid property |
/cvmfs/grid.cern.ch/etc/grid-security/certificates |
/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/etc/grid-security/vomsdir |
/srv/localstage/condor/dir_7410/tmpftzyFy |
- |
LCG.UKI-LT2-QMUL.uk |
Failed | Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Request lacks the org.dcache.uuid property |
/scratch/tmp//khRNDm0yzvwn3JPVEm4QteWmABFKDmABFKDmH9FKDmABFKDmrQjjGn/arc/certificates |
/scratch/tmp/khRNDm0yzvwn3JPVEm4QteWmABFKDmABFKDmH9FKDmABFKDmrQjjGn/DIRAC_xK1DI7pilot/etc/grid-security/vomsdir |
/scratch/lcg/pillz09/6122620/tmpA9Wbse |
- |
LCG.UKI-NORTHGRID-LANCS-HEP.uk |
Failed | Error in <TNetXNGFile::Open>: [FATAL] Auth failed : ls: cannot access /opt/gridapps/etc/grid-security/certificates: No such file or directory : ls: cannot access /home/iris/pltlz006/home_cream_675579780/CREAM675579780/DIRAC_YExrYJpilot/etc/grid-security/vomsdir: No such file or directory |
/opt/gridapps/etc/grid-security/certificates |
/home/iris/pltlz006/home_cream_675579780/CREAM675579780/DIRAC_YExrYJpilot/etc/grid-security/vomsdir |
/home/data/tmp/3608722.1.grid7/tmpip05BS |
- |
LCG.UKI-NORTHGRID-LIV-HEP.uk |
Failed | Error in <TNetXNGFile::Open>: [FATAL] Auth failed : ls: cannot access /data/condor_pool/dir_7075/DIRAC_VDOB_4pilot/etc/grid-security/vomsdir: No such file or directory : ls: cannot access /data/condor_pool/dir_7075/tmpfa8ZlR: No such file or directory |
/etc/grid-security/certificates |
/data/condor_pool/dir_7075/DIRAC_VDOB_4pilot/etc/grid-security/vomsdir |
/data/condor_pool/dir_7075/tmpfa8ZlR |
- |
LCG.UKI-NORTHGRID-MAN-HEP.uk |
Done | - | /etc/grid-security/certificates |
/scratch/condor_pool/condor/dir_26932/CFBNDmvyzvwnOkaSmpEpAjQq5wXwEmABFKDmtOwTDmABFKDmre8JPo/DIRAC_fDllEvpilot/etc/grid-security/vomsdir |
/scratch/condor_pool/condor/dir_26932/tmpAoPSbv |
25 |
LCG.UKI-SCOTGRID-ECDF.uk |
Failed | Error in <TNetXNGFile::Open>: [FATAL] Auth failed : ls: cannot access /local/2127066.1.eddie/pCTNDmxyzvwntvq09p9vnX1nABFKDmABFKDmfWJKDmABFKDmDSeqln/DIRAC_UcGokIpilot/etc/grid-security/vomsdir: No such file or directory : ls: cannot access /local/2127066.1.eddie/tmpMYAb06: No such file or directory |
/etc/grid-security/certificates |
/local/2127066.1.eddie/pCTNDmxyzvwntvq09p9vnX1nABFKDmABFKDmfWJKDmABFKDmDSeqln/DIRAC_UcGokIpilot/etc/grid-security/vomsdir |
/local/2127066.1.eddie/tmpMYAb06 |
- |
LCG.UKI-SOUTHGRID-OX-HEP.uk |
Failed | Error in <TNetXNGFile::Open>: [FATAL] Auth failed : ls: cannot access /home/pool/condor/dir_205354/GSdKDmoyzvwnjGWBFmwldEhq1cyeCnABFKDmI2GODmABFKDmlB9W0m/DIRAC_MhPfmlpilot/etc/grid-security/vomsdir: No such file or directory : ls: cannot access /home/pool/condor/dir_205354/tmphLB0zu: No such file or directory |
/etc/grid-security/certificates |
/home/pool/condor/dir_205354/GSdKDmoyzvwnjGWBFmwldEhq1cyeCnABFKDmI2GODmABFKDmlB9W0m/DIRAC_MhPfmlpilot/etc/grid-security/vomsdir |
/home/pool/condor/dir_205354/tmphLB0zu |
- |
LCG.UKI-SOUTHGRID-RALPP.uk |
Failed | Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3010] Request lacks the org.dcache.uuid property |
/scratch/condor/dir_213526/p0sNDmpyzvwnOOVDjqUTj3jq6xrg1pABFKDmJ14SDmABFKDmAcPTmm/arc/certificates |
/scratch/condor/dir_213526/p0sNDmpyzvwnOOVDjqUTj3jq6xrg1pABFKDmJ14SDmABFKDmAcPTmm/DIRAC__uJfwppilot/etc/grid-security/vomsdir |
/scratch/condor/dir_213526/tmpG8O_cv |
- |
VAC.UKI-NORTHGRID-MAN-HEP.uk |
Done | - | /etc/grid-security/certificates |
/scratch/plt/etc/grid-security/vomsdir |
/tmp/x509up_u10000 |
65 |
VAC.UKI-SCOTGRID-GLASGOW.uk |
Done | - | /etc/grid-security/certificates |
/scratch/plt/etc/grid-security/vomsdir |
/tmp/x509up_u10000 |
20 |
I will try to implement a pre-ample to starting the Singularity container. This will allow for some python code to be executed beforehand. I think that is better than just allowing for environment variables in the mounts (which will fail for X509_USER_PROXY
as it is a file and not a directory).
The issue of the org.dcache.uuid
property, I still do not know what is going on there. It is obviously not related to the singularity container itself (then it would fail everywhere). can you try to do a printenv
inside the job. We can then compare this for a working and non-working site and try to work out where there might be a difference.
Here are the printenv
A successful job with no errors at CLOUD.RAL-LCG2.uk
DIRAC=/scratch/plt
XDG_SESSION_ID=c1
HOSTNAME=vcycle-gds-vm-lz-s9mexdtjmu
DIRAC_PROCESSORS=1
SHELL=/bin/bash
TERM=unknown
GFAL_PLUGIN_DIR=/scratch/plt/Linux_x86_64_glibc-2.17/lib/gfal2-plugins
HISTSIZE=1000
DIRACPYTHON=/scratch/plt/Linux_x86_64_glibc-2.17/bin/python2.7
PYTHONUNBUFFERED=yes
JOBID=25409988
QTDIR=/usr/lib64/qt-3.3
SINGULARITY_APPNAME=
X509_CERT_DIR=/etc/grid-security/certificates
DIRAC_WHOLENODE=False
QTINC=/usr/lib64/qt-3.3/include
DIRACLIB=/scratch/plt/Linux_x86_64_glibc-2.17/lib
LC_ALL=en_US.UTF-8
QT_GRAPHICSSYSTEM_CHECKED=1
PILOT_UUID=vm://vcycle-ral.blackett.manchester.ac.uk/vcycle-ral.blackett.manchester.ac.uk:1590066986.vcycle-gds-vm-lz-s9mexdtjmu:gds-vm-lz
USER=plt00p00
USER_PATH=/scratch/plt/25409988:/scratch/plt/Linux_x86_64_glibc-2.17/bin:/scratch/plt/Linux_x86_64_glibc-2.17/bin:/scratch/plt/scripts:/scratch/plt/Linux_x86_64_glibc-2.17/bin:/usr/lib64/qt-3.3/bin:/opt/google-cloud-sdk/bin:/usr/lib/ec2/bin:/usr/lib64/ccache:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin
DIRACSYSCONFIG=/scratch/plt/pilot.cfg
LD_LIBRARY_PATH=/.singularity.d/libs
SUDO_USER=plt
SUDO_UID=1000
EC2_HOME=/usr/lib/ec2
SINGULARITY_NAME=singularity_sandbox
DIRACROOT=/scratch/plt
USERNAME=plt00p00
GLOBUS_IO_IPV6=TRUE
MAIL=/var/spool/mail/plt
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
CERNVM_ENV=1
CONDOR_CONFIG=/etc/condor/condor_config
PWD=/scratch/plt/25409988
JAVA_HOME=/usr
PYTHONOPTIMIZE=x
JOBFEATURES=https://vm85.blackett.manchester.ac.uk:443/machines/vcycle-ral.blackett.manchester.ac.uk/vcycle-gds-vm-lz-s9mexdtjmu/jobfeatures
LANG=C
MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
LOADEDMODULES=
DCOMMANDS_PPID=6517
X509_VOMS_DIR=/scratch/plt/etc/grid-security/vomsdir
QT_GRAPHICSSYSTEM=native
DIRACSCRIPTS=/scratch/plt/scripts
DIRACSITE=CLOUD.RAL-LCG2.uk
HISTCONTROL=ignoredups
SSL_CERT_DIR=/etc/grid-security/certificates
SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
SHLVL=11
SUDO_COMMAND=/bin/sh -c /scratch/plt/job/Wrapper/Job25409988
DIRACJOBID=25409988
HOME=/scratch/plt00p00
MACHINEFEATURES=https://vm85.blackett.manchester.ac.uk:443/machines/vcycle-ral.blackett.manchester.ac.uk/vcycle-gds-vm-lz-s9mexdtjmu/machinefeatures
LANGUAGE=en_US.UTF-8
X509_USER_PROXY=/tmp/x509up_u10000
OPENSSL_CONF=/tmp
ARC_PLUGIN_PATH=/scratch/plt/Linux_x86_64_glibc-2.17/lib/arc
DIRACBIN=/scratch/plt/Linux_x86_64_glibc-2.17/bin
DYLD_LIBRARY_PATH=/scratch/plt/Linux_x86_64_glibc-2.17/lib:/scratch/plt/Linux_x86_64_glibc-2.17/lib:/scratch/plt/Linux_x86_64_glibc-2.17/lib/mysql:/scratch/plt/Linux_x86_64_glibc-2.17/lib:
GFAL_CONFIG_DIR=/scratch/plt/Linux_x86_64_glibc-2.17/etc/gfal2.d
AGENT_WORKDIRECTORY=/scratch/plt/work/WorkloadManagement/JobAgent
PYTHONPATH=/scratch/plt:/scratch/plt:/scratch/plt
JOB_ID=vcycle-ral.blackett.manchester.ac.uk:1590066986.vcycle-gds-vm-lz-s9mexdtjmu:gds-vm-lz
LOGNAME=plt00p00
CVS_RSH=ssh
QTLIB=/usr/lib64/qt-3.3/lib
XDG_DATA_DIRS=/scratch/plt/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share
MODULESHOME=/usr/share/Modules
LESSOPEN=||/usr/bin/lesspipe.sh %s
PROMPT_COMMAND=PS1="Singularity> "; unset PROMPT_COMMAND
SINGULARITY_CONTAINER=/scratch/plt/25409988/singularity_sandbox
SUDO_GID=1000
XDG_RUNTIME_DIR=/scratch/plt/25409988/.xdg
GLOBUS_FTP_CLIENT_IPV6=TRUE
JOBOUTPUTS=https://vm85.blackett.manchester.ac.uk:443/machines/vcycle-ral.blackett.manchester.ac.uk/vcycle-gds-vm-lz-s9mexdtjmu/joboutputs
RRD_DEFAULT_FONT=/scratch/plt/Linux_x86_64_glibc-2.17/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf
DIRACPLAT=Linux_x86_64_glibc-2.17
_=/usr/bin/printenv
An unsuccessful job with the org.dcache.uuid
problem at LCG.UKI-LT2-IC-HEP.uk
:
DIRAC=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot
_CONDOR_JOB_PIDS=
DIRAC_PROCESSORS=1
GFAL_PLUGIN_DIR=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/lib/gfal2-plugins
DIRACPYTHON=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/bin/python2.7
TMPDIR=/srv/localstage/condor/dir_7410
PYTHONUNBUFFERED=yes
JOBID=25409995
_CONDOR_SCRATCH_DIR=/srv/localstage/condor/dir_7410
SINGULARITY_APPNAME=
X509_CERT_DIR=/cvmfs/grid.cern.ch/etc/grid-security/certificates
DIRAC_WHOLENODE=False
DIRACLIB=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/lib
LC_ALL=en_US.UTF-8
_CHIRP_DELAYED_UPDATE_PREFIX=Chirp*
_CONDOR_ANCESTOR_23186=27254:1588924743:1633129938
USER_PATH=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/25409995:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/bin:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/bin:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/scripts:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin
TEMP=/srv/localstage/condor/dir_7410
LD_LIBRARY_PATH=/.singularity.d/libs
BATCH_SYSTEM=HTCondor
VO_CMS_SW_DIR=/cvmfs/cms.cern.ch
SINGULARITY_NAME=singularity_sandbox
DIRACROOT=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot
_CONDOR_CHIRP_CONFIG=/srv/localstage/condor/dir_7410/.chirp.config
CONDORCE_COLLECTOR_HOST=ceprod03.grid.hep.ph.ic.ac.uk:9619
HTCONDOR_JOBID=280235.0
GLOBUS_IO_IPV6=TRUE
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
_CONDOR_BIN=/usr/bin
PWD=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/25409995
PYTHONOPTIMIZE=x
LANG=en_US.UTF-8
DCOMMANDS_PPID=7961
X509_VOMS_DIR=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/etc/grid-security/vomsdir
DIRACSCRIPTS=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/scripts
_CONDOR_SLOT=slot1_6
DIRACSITE=LCG.UKI-LT2-IC-HEP.uk
_CONDOR_ANCESTOR_27254=7410:1590067886:1401183745
SSL_CERT_DIR=/cvmfs/grid.cern.ch/etc/grid-security/certificates
SHLVL=10
DIRACJOBID=25409995
HOME=/home/batch/job0006
_CONDOR_MACHINE_AD=/srv/localstage/condor/dir_7410/.machine.ad
TERMINFO=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/share/terminfo:/usr/share/terminfo:/etc/terminfo
LANGUAGE=en_US.UTF-8
OPENSSL_CONF=/tmp
X509_USER_PROXY=/srv/localstage/condor/dir_7410/tmpftzyFy
ARC_PLUGIN_PATH=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/lib/arc
DIRACBIN=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/bin
_CONDOR_ANCESTOR_7410=7414:1590067888:1504722386
DYLD_LIBRARY_PATH=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/lib:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/lib:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/lib/mysql:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/lib:
GFAL_CONFIG_DIR=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/etc/gfal2.d
AGENT_WORKDIRECTORY=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/work/WorkloadManagement/JobAgent
PYTHONPATH=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot:/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot
TMP=/srv/localstage/condor/dir_7410
OMP_NUM_THREADS=1
_CONDOR_JOB_AD=/srv/localstage/condor/dir_7410/.job.ad
PROMPT_COMMAND=PS1="Singularity> "; unset PROMPT_COMMAND
SINGULARITY_CONTAINER=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/25409995/singularity_sandbox
XDG_RUNTIME_DIR=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/25409995/.xdg
GLOBUS_FTP_CLIENT_IPV6=TRUE
_CONDOR_JOB_IWD=/srv/localstage/condor/dir_7410
RRD_DEFAULT_FONT=/srv/localstage/condor/dir_7410/DIRAC_6UoDonpilot/Linux_x86_64_glibc-2.17/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf
DIRACPLAT=Linux_x86_64_glibc-2.17
_=/usr/bin/printenv
Asking for some help ... https://github.com/xrootd/xrootd/issues/1202
I eventually got a reply from xrootd
support, see https://github.com/xrootd/xrootd/issues/1202#issuecomment-649527054
So it seems like this is not a problem of Ganga or Singularity. Annoying. In any case I close the issue here.
Accessing root grid file through xrootd failed on a dirac job when using singularity virtualisation. Now singularity is not expected to be found in
$PATH
on any dirac site had to use ganga pr #1670 to use singularity binary from cvmfs. However, i get an authentication failure when i try to access the root file with xrootd. Here's the simple shell script i'm running:This is the ganga python job:
This is the error message:
Here are the x509 envs for the dirac singularity:
Within the container you can access
X509_VOMS_DIR
,X509_USER_PROXY
but notX509_CERT_DIR
.When re-running and setting
export XRD_LOGLEVEL="Debug"
here is the log:Similar issue to #1668