ganga-devs / ganga

Ganga is an easy-to-use frontend for job definition and management
GNU General Public License v3.0
100 stars 159 forks source link

Apptainer run GaudiExec #2331

Closed mesmith75 closed 7 months ago

mesmith75 commented 7 months ago

Fixes #2328

laf070810 commented 7 months ago

Tested but not working from my side with Gauss/v56r7 and x86_64_v2-centos7-gcc11-opt. Using Local backend it gives

FATAL:   container creation failed: mount hook function failure: hook function for tag prelayer returns error: failed to create /tmp/.Test-unix directory: mkdir /tmp/.Test-unix: file exists

Using Slurm backend it similarly gives

--- GANGA APPLICATION ERROR BEGIN ---
FATAL:   container creation failed: mount hook function failure: mount /cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.39-1713688048/Linux-x86_64/var/apptainer/mnt/session/rootfs/tmp/lua_4HU004->/cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.39-1713688048/Linux-x86_64/var/apptainer/mnt/session/underlay/tmp/lua_4HU004 error: while mounting /cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.39-1713688048/Linux-x86_64/var/apptainer/mnt/session/rootfs/tmp/lua_4HU004: destination /cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.39-1713688048/Linux-x86_64/var/apptainer/mnt/session/underlay/tmp/lua_4HU004 doesn't exist in container
--- GANGA APPLICATION ERROR END ---
mesmith75 commented 7 months ago

So this is not working with slc6 applications. There is a container, cvm3 but I have not been able to get an application to compile with it

mesmith75 commented 7 months ago

Turns out my issues were in fact some ancient cmake bug. I think this works

mesmith75 commented 7 months ago

Tested but not working from my side with Gauss/v56r7 and x86_64_v2-centos7-gcc11-opt. Using Local backend it gives

FATAL:   container creation failed: mount hook function failure: hook function for tag prelayer returns error: failed to create /tmp/.Test-unix directory: mkdir /tmp/.Test-unix: file exists

Using Slurm backend it similarly gives

--- GANGA APPLICATION ERROR BEGIN ---
FATAL:   container creation failed: mount hook function failure: mount /cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.39-1713688048/Linux-x86_64/var/apptainer/mnt/session/rootfs/tmp/lua_4HU004->/cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.39-1713688048/Linux-x86_64/var/apptainer/mnt/session/underlay/tmp/lua_4HU004 error: while mounting /cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.39-1713688048/Linux-x86_64/var/apptainer/mnt/session/rootfs/tmp/lua_4HU004: destination /cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.39-1713688048/Linux-x86_64/var/apptainer/mnt/session/underlay/tmp/lua_4HU004 doesn't exist in container
--- GANGA APPLICATION ERROR END ---

I guess it depends a little bit on what exactly you are doing.

egede commented 7 months ago

I am not sure this is the right approach. lb-run has support for running in containers, and I think that we should use that, rather than running inside a container that we have started ourselves.

laf070810 commented 7 months ago

Tested but not working from my side with Gauss/v56r7 and x86_64_v2-centos7-gcc11-opt. Using Local backend it gives

FATAL:   container creation failed: mount hook function failure: hook function for tag prelayer returns error: failed to create /tmp/.Test-unix directory: mkdir /tmp/.Test-unix: file exists

Using Slurm backend it similarly gives

--- GANGA APPLICATION ERROR BEGIN ---
FATAL:   container creation failed: mount hook function failure: mount /cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.39-1713688048/Linux-x86_64/var/apptainer/mnt/session/rootfs/tmp/lua_4HU004->/cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.39-1713688048/Linux-x86_64/var/apptainer/mnt/session/underlay/tmp/lua_4HU004 error: while mounting /cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.39-1713688048/Linux-x86_64/var/apptainer/mnt/session/rootfs/tmp/lua_4HU004: destination /cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.39-1713688048/Linux-x86_64/var/apptainer/mnt/session/underlay/tmp/lua_4HU004 doesn't exist in container
--- GANGA APPLICATION ERROR END ---

I guess it depends a little bit on what exactly you are doing.

I made a reproducer to help locate the problem.

reproducer.py:

application = prepareGaudiExec("Gauss", "v56r7", myPath=".")
application.platform = "x86_64_v2-centos7-gcc11-opt"
application.useApptainer = True
application.options = ["reproducer_options.py"]
splitter = GaussSplitter(numberOfJobs=5, eventsPerJob=5)
job = Job(name="test", comment="test", application=application, splitter=splitter, backend=Local())
job.submit()

reproducer_options.py:

from Gauss.Configuration import GenInit, importOptions

importOptions("$GAUSSOPTS/Gauss-2016.py")
importOptions("$GAUSSOPTS/GenStandAlone.py")
importOptions("$DECFILESROOT/options/12297023.py")
importOptions("$LBPYTHIA8ROOT/options/Pythia8.py")

GaussGen = GenInit("GaussGen")
GaussGen.RunNumber = 0

from Gauss.Configuration import OutputStream
OutputStream("GaussTape").Output = "DATAFILE='Gauss.xgen'"

Running ganga -i reproducer.py on EL9/Alma9 machines should yield the error.

mesmith75 commented 7 months ago

This almost certainly does not work with prepareGaudiExec

laf070810 commented 7 months ago

Thanks! And I found that the reproducer proposed above actually works well on other machines...and my original codes also work well on other machines...so my problem should be specific to a machine's configuration and irrelevant to the apptainer issue here. Many apologies for my misconception.

mesmith75 commented 7 months ago

@laf070810 Let us know if you still experience issues and we can dig a bit. As I say, this code doesn't work with prepareGaudiExec but in general containerisation should be ok I think. Although you posted some errors about mounting /cvmfs which are beyond my understanding

mesmith75 commented 7 months ago

This does not start a container, except to carry out the build. As I say, we don't want to do anything to the script that runs on the WN as lb-run is already taking care of it.

When building the application you have to check out the application and build it in the right container. Perhaps we can do that with lb-run but it is not obvious how.

egede commented 7 months ago

Hmm, I am afraid that I still see some issues here and fail to get things working on this branch.