Closed mesmith75 closed 7 months ago
Looks like the standard ganga virtualisation, e.g. using something like:
j.virtualization = Apptainer("/cvmfs/cernvm-prod.cern.ch/cvm4")
or
j.virtualization = Apptainer("docker://gitlab-registry.cern.ch/lhcb-core/lbdocker/slc6-build:latest")
does not work for GaudiExec?
Looks like the standard ganga virtualisation, e.g. using something like:
j.virtualization = Apptainer("/cvmfs/cernvm-prod.cern.ch/cvm4")
or
j.virtualization = Apptainer("docker://gitlab-registry.cern.ch/lhcb-core/lbdocker/slc6-build:latest")
does not work for GaudiExec?
While it might be possible to get that to work, it will be better to just implement it in a transparent way for the GaudiExec application.
Looks like the standard ganga virtualisation, e.g. using something like:
j.virtualization = Apptainer("/cvmfs/cernvm-prod.cern.ch/cvm4")
or
j.virtualization = Apptainer("docker://gitlab-registry.cern.ch/lhcb-core/lbdocker/slc6-build:latest")
does not work for GaudiExec?
While it might be possible to get that to work, it will be better to just implement it in a transparent way for the GaudiExec application.
As a user a timely solution would be great. Let me know if and how I could help.
I was investigating this a bit further. So there are two issues at play here. I consider a job of the type
j = Job(application = prepareGaudiExec('DaVinci','v39r1p6', myPath='.', platform='x86_64-slc6-gcc49-opt', options=['empty.py'])
where empty.py
is a python file with just the line pass
in it.
If you are on an el9
machine, then j.prepare()
will fail for this job as the cmake
command fails. If starting Ganga inside a centos7
apptainer, then the j.prepare()
step works. Clearly there is an issue that should be fixed there.
Having prepared the job inside a centos7
apptainer, then job can then be submitted (from a standard session running on el9
). The job then runs on the Dirac
backend just fine. This is compatible with observations from others that jobs start but then crash later. In the JDL for the job when looking at the Dirac monitoring, I see Platform = "x86_64-slc6"
which is correct and in the stderr
of the job, I also see WARNING:lb-run:Decided best container to use is apptainer
which indicates that the job already run inside an apptainer. I indeed confirm this by running the job with the Local backend. So for runtime errors, it looks like a problem with how lb-run
works and not a Ganga problem.
The command
exec lb-run --siteroot=${MYSITEROOT:-/cvmfs/lhcb.cern.ch/lib} -c x86_64-slc6-gcc49-opt --path-to-project ${base_dir}/DaVinciDev_v39r1p6 bash
indeed sees you ending up in an slc6
environment
DaVinciDev v39r1p6] DaVinciDev_v39r1p6 $ cat /etc/redhat-release
Scientific Linux release 6.9 (Carbon)
however, you can't run the configuration step inside that apptainer
[DaVinciDev v39r1p6] DaVinciDev_v39r1p6 $ cmake --build /home/egede/DaVinciDev_v39r1p6/build.x86_64-slc6-gcc49-opt --target ganga-input-sandbox
cmake: symbol lookup error: /cvmfs/lhcb.cern.ch/lib/var/lib/LbEnv/3114/stable/linux-64/bin/../lib/libuv.so.1: undefined symbol: sendmmsg
So we can't run the cmake command inside an slc6
environment (for a DaCinci version that requires slc6
), but it works on centos7
. Not very helpful.
Yes we can run the command inside the apptainer. I'll open an MR later today.
As fair as I can tell it is just the make that needs adjusting. The jobs seem to run automatically with apptainer on the WN.
In that case the runtime errors reported are completely unrelated. Let's see.
Getting the build fixed at least is useful though. I ran an example job fine - the log showed it ran inside a container.
For some very old applications (things requiring slc6) you need to use apptainer as they are not functional with el9.
Should be a very short addition to the run line command.