Closed egede closed 6 months ago
@heistera A fix will be implemented for this soon.
I am also bitten by this bug.
ERROR:lb-run:current host does not support platform x86_64_v2-centos7-gcc11-opt (dirac_platform: broadwell-el9, required: x86_64_v2-centos7, os_id: almalinux9)
Glad to see the work on-going.
I am also bitten by this bug.
ERROR:lb-run:current host does not support platform x86_64_v2-centos7-gcc11-opt (dirac_platform: broadwell-el9, required: x86_64_v2-centos7, os_id: almalinux9)
Glad to see the work on-going.
Not obvious to me that this is the same problem. In the other cases we have seen a runtime error, whereas in the case you report here, the Gaudi job doesn't even start. The cure may be the same though.
I agree with Ulrik. To me it looked like the jobs which crashed for me had an environment. Not sure if it was the correct one, though ...
Indeed, my jobs didn't even start...but the cure may be the same. If I run lb-run
manually, it can automatically choose apptainer and run normally. But running in ganga will yield the above error.
I find this strange - my test jobs said they were running in apptainer. I guess there is nothing wrong with being explicit about though.
My investigation is so far only for the Local backend (where the apptainer message is not there). So we might not be all the way there. However, it turns out the run
file with the lb-run
command inside is written by the make
step.
build.x86_64-slc6-gcc49-opt/ganga/run:exec lb-run --siteroot=${MYSITEROOT:-/cvmfs/lhcb.cern.ch/lib} -c x86_64-slc6-gcc49-opt --path-to-project ${base_dir}/DaVinciDev_v39r1p6 "$@"
The file should instead have
exec lb-run --siteroot=${MYSITEROOT:-/cvmfs/lhcb.cern.ch/lib} -c x86_64-slc6-gcc49-opt --container apptainer --path-to-project ${base_dir}/DaVinciDev_v39r1p6 "$@"
for the Local backend to work.
And there is in fact a further problem. When a job is submitted with the Local
backend, it inherits the environment of the Ganga session. This (among other things) means that a different (and older) version of lb-run
is used which doesn't understand the --container
option.
I have successfully reproduced the original issue on the grid. I guess it is not doing what I thought after all
If a GaudiExec application is created with a release that requires an old platform like slc6, the job is in fact not running inside the container. The problem affects both the Dirac and the Local backend. After debugging, it turns out that the
run
script written by the GaudiExec runtime handler is missing the argument--container apptainer
. With that in place the job runs.