MatterMiners / tardis

Transparent Adaptive Resource Dynamic Integration System
https://cobald-tardis.readthedocs.io
MIT License
14 stars 20 forks source link

Support HTCondorCE drone submission #343

Open rodwalker opened 7 months ago

rodwalker commented 7 months ago

Would it be possible to submit drones via the existing HTCondorCEs? In this way a grid site could be integrated without any local action. It comes up because HH have an dCache DT next week and are open to using the compute with a remote RSE.

Cheers, Rod.

maxfischer2781 commented 7 months ago

If this is needed quickly it might already be possible, even if it is a bit clunky. The HTCondor site adapter should be able to also speak with any HTCondor-CE – after all, the CE is still a regular HTCondor system. This still requires a local condor SchedD, however.
Configure the HTCondor site adapter JDL to include universe = grid and set the grid_resource to point at the CE. For example, for GridKa that would look like this:

universe = grid
grid_resource = condor htcondor-ce-2-kit.gridka.de htcondor-ce-2-kit.gridka.de:9619

(I think @giffels runs a setup like this and may comment if I missed something.)


That said, it would also definitely possible to extend the HTCondor Site Adapter code to support remote pools. One would need the condor client tools locally but not a persistent Schedd.

giffels commented 7 months ago

Yes, I am running multiple setups using this approach. You can use a jdl like the following.

executable = /var/lib/cobald/drones/desy/pilot_setup.sh
arguments=${Arguments}
output = logs_desy/$$(cluster).$$(process).out
error = logs_desy/$$(cluster).$$(process).err
log = logs_desy/cluster.log

transfer_input_files = drones

universe = grid
use_scitokens = auto
scitokens_file = /var/lib/cobald/cms.token
grid_resource = condor htcondor-ce-x.desy.de htcondor-ce-x.desy.de:9619

request_cpus=${Cores}
request_memory=${Memory}
request_disk=${Disk}

queue 1

The important thing here:

#set environment from arguments
while [[ $# -gt 0 ]]; do
  case $1 in
    --cores=*)
      export TardisDroneCores=${1##*=}
      shift
      ;;
    --memory=*)
      export TardisDroneMemory=${1##*=}
      shift
      ;;
    --disk=*)
      export TardisDroneDisk=${1##*=}
      shift
      ;;
    --uuid=*)
      export TardisDroneUuid=${1##*=}
      shift
      ;;
    *)
    echo "Ignoring unknown argument $1!"
    shift
    ;;
  esac
done
rodwalker commented 7 months ago

I see this is already possible and even documented, so fine to close. I need to think how best to deploy this. Since the trickiest part is maintaining the token to submit to CEs, and it needs to scale, it might be best if I put this straight into the Harvester submission at CERN.

giffels commented 7 months ago

We are using https://osg-htc.org/docs/other/osg-token-renewer/ to manage the renewal of the token.