Open IzaakWN opened 7 months ago
The environment issue in lxplus HTCondor should be solved with PR https://github.com/cms-tau-pog/TauFW/pull/67. It should also be possible now to ask jobs to be run in a singularity container.
One issue, however, is that if you work in a singularity, you lose the ability to submit jobs (condor_submit
cannot be found anymore). We need to find a workaround for this... :( It would mean people have to exit the singularity to submit jobs.
However, if you have a CMSSW 11 or 12 setup with SCL7/CC7 inside a cmssw-cc7
singularity on a lxplus EL9 node, C++ libraries like ROOT will stop working, and so we run into a new compatibility issue... Currently,
pico.py submit
needs both ROOT and condor_submit
, andpico.py status
needs ROOT and condor_q
...
This means that when using a singularity, we need to prepare jobs inside the singularity, and then exit it to submit it (e.g. as a simple shell script with all the condor_submit
commands).It now seems possible to submit HTCondor jobs inside (SCL7/CC7) singularities, with the following instructions: https://gitlab.cern.ch/cms-cat/cmssw-lxplus/-/tree/master
#!/bin/bash
export APPTAINER_BINDPATH=/afs,/cvmfs,/cvmfs/grid.cern.ch/etc/grid-security:/etc/grid-security,/cvmfs/grid.cern.ch/etc/grid-security/vomses:/etc/vomses,/eos,/etc/pki/ca-trust,/etc/tnsnames.ora,/run/user,/tmp,/var/run/user,/etc/sysconfig,/etc:/orig/etc
schedd=`myschedd show -j | jq .currentschedd | tr -d '"'`
apptainer -s exec /cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/cms-cat/cmssw-lxplus/cmssw-el7-lxplus:latest/ sh -c "source /app/setupCondor.sh && export _condor_SCHEDD_HOST=$schedd && export _condor_SCHEDD_NAME=$schedd && export _condor_CREDD_HOST=$schedd && /bin/bash "
and
export _condor_SCHEDD_HOST=bigbirdXY.cern.ch
export _condor_SCHEDD_NAME=bigbirdXY.cern.ch
export _condor_CREDD_HOST=bigbirdXY.cern.ch
In HTCondor config files:
MY.SingularityImage = "/cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/cms-cat/cmssw-lxplus/cmssw-el7-lxplus:latest/"
Also see general instructions for using containers with lxplus's HTCondor system: https://batchdocs.web.cern.ch/containers/index.html
Issue: Environment not set
Since March, HTCondor jobs on lxplus do not have the CMSSW environment set correctly, nor
JOBID
orTASKID
as defined insubmit_HTCondor.sub
. This causes the following error and subsequent job failure:Our hacky workaround was to hardcode our individual
CMSSW_BASE
path in the executablesubmit_HTCondor.sh
script and docmsenv
...The cause appears to be that newer HTCondor versions have a "new syntax" (documented here), and we have to simply change
to
I'll make a PR with a patch asap.
Issue: SLC7/CC7/CentOS7 compatibility on lxplus
CERN's lxplus is phasing out CentOS7 by end of June 2024 (see this announcement and this page).
If we want to keep using CMSSW 11 or 12 on a SLC7 architecture, we have to use a singularity on lxplus user nodes and in HTCondor jobs, see this page:
I'll add this in a future PR as well, and update the instructions in the documentation...