DUNE / dist-comp

Action items for DUNE distributed computing, and common scripts that are used.
2 stars 0 forks source link

Prepare for tokens only at vocms0207.cern.ch in May #67

Closed StevenCTimm closed 1 year ago

StevenCTimm commented 1 year ago

This is the E-mail I got from Marco Mascheroni who operates the "CERN Factory" vocms0207.cern.ch

Hi Steven, Il mar 18 apr 2023, 6:30 PM Steven C Timm [timm@fnal.gov](mailto:timm@fnal.gov) ha scritto: Hi everyone

I am not sure if this question should go to this E-mail list, but I have three questions re. the schedule for vocms0207.cern.ch

CMS has graciously allowed DUNE and some other Fermilab-based VO's to use vocms0207.cern.ch factory to get to some of our european sites which do not yet take tokens. some of these are htcondor-ce based and some of them are arc-ce based.

I have three questions is there a firm schedule to upgrade vocms0207.cern.ch to the htcondor 10 series, at which point gsi-based condor universe glidein submission would no longer be supported? The schedule for upgrading vocms0207 to condor 10 is dictated by Condor. We haven't set a date yet, but we plan to do it before the end of May considering the end of support for HTCSS 9 is also in May currently. What will be the status of ARC CE support after that time, will we still be able to submit to them using x.509 based proxies, and if so through which factory or factories? The plan is to configure the UCSD factory to use token, and the CERN one to use proxy. Proxy fallback is not a thing for ARC CEs, so it's either one or the other. Has anyone at factory ops polled the various sites to see which ones are ready to take scitokens and which ones are not. 3(a) if so, was this done only for CMS VO or for all the VO's? This has been done for CMS for both ARC and HTCondor, both EGI and OSG.

Afaik there's no plans for similar campaigns for other VOs, but Jeff might have more info.

Do you have a list of sites you are interested in?

StevenCTimm commented 1 year ago

So I have tested all the htcondor-ce ones which are on the vocms0207.cern.ch factory Have yet to test the ARC ones

Those that fail:

All ce5xx.cern.ch cccondorce01,02.in2p3.fr cexx.cat.cbpf.br lcgce02.phy.bris.ac.uk dune-condor.heprc.uvic.ca heposg1-colorado.sites.opensciencegrid.org osgcex.farm.particle.cz gate02.grid.umich.edu heplnx206.pp.rl.ac.uk + 207, 208 osg-gw-7.t2.uscd.edu plus osg-gw-6

A couple of the above may actually be arc CE's I will go back and check them again.

those that succeed its-condor-cexx.syr.edu gpce03,gpce04.fnal.gov condorce1.ciemat.es, condorce2.ciemat.es

some number of them are still pending

Will make a more formal list and instructions how to submit to them and then try all this again.

StevenCTimm commented 1 year ago

It's possible that some or all of the htcondor-ce may be requiring a specific restricted "aud" field and rejecting our token because of that.. have asked CERN.

Steve Timm

StevenCTimm commented 1 year ago

For instance glideins now appear to be not holding from gfactory-2.opensciencegrid.org to CERN as they were before.

StevenCTimm commented 1 year ago

CERN doesn't know anything about an audience field.

Have come to the conclusion that it will be best to try to have the factory ops do as much testing from the factory itself as they can. It is difficult to make a working standalone setup.. several of my tests fail even though the CE in question is known to take our tokens.

StevenCTimm commented 1 year ago

Have requested Marco Mascheroni and rest of factory ops to do this, we will see.

StevenCTimm commented 1 year ago

Edita sent following E-mail:

Hello,

I added entries to UCSD factory [1]. Now all DUNE entries should be in UCSD factory. UCSD factory does not support Russian sites, so you will not see pilots at Nova_RU_JINR_cloud-osg-ce. Tokens are not working on these entries [2]. I cannot check if tokens work on CMSHTPC_T2_US_MIT_ce01, because there are no pilots and on CMSHTPC_T2_US_Florida_osg, because it is in downtime. I removed entries from CERN factory [3], since they are working on UCSD factory and supports only DUNE VO.

Edita

[1] DUNE_BR_CBPF_ce01 DUNE_BR_CBPF_ce02 DUNE_BR_CBPF_ce03 DUNE_BR_CBPF_ce04 DUNE_FR_CCIN2P3_cccondorce01 DUNE_FR_CCIN2P3_cccondorce02 DUNE_T2_ES_CIEMAT_condorce1 DUNE_T2_ES_CIEMAT_condorce2 Nova_CZ_FZU_osgce1 Nova_CZ_FZU_osgce2 Nova_RU_JINR_cloud-osg-ce DUNE_T2_UK_London_IC_ceprod00 DUNE_T2_UK_London_IC_ceprod01 DUNE_T2_UK_London_IC_ceprod02 DUNE_T2_UK_London_IC_ceprod03 DUNE_T1_ES_PIC_ce13-multicore DUNE_T1_ES_PIC_ce14-multicore

[2] HCC_US_BNL_gk01 HCC_US_BNL_gk02 HCC_US_Michigan_gate02 DUNE_T2_UK_London_IC_ceprod00 DUNE_T2_UK_London_IC_ceprod01 DUNE_T2_UK_London_IC_ceprod02 DUNE_T2_UK_London_IC_ceprod03 Nova_RU_JINR_cloud-osg-ce

[3] DUNE_BR_CBPF_ce01 DUNE_BR_CBPF_ce02 DUNE_BR_CBPF_ce03 DUNE_BR_CBPF_ce04 DUNE_T2_ES_CIEMAT_condorce1 DUNE_T2_ES_CIEMAT_condorce2 FNAL_GPGrid_ce03_mcore_op_duneonly FNAL_GPGrid_ce04_mcore_op_duneonly

StevenCTimm commented 1 year ago

Edita's assessment is correct, all those in [2] are failing, the ones in [3] are working. Next step is to look through and make sure all htcondor-ces are accounted for.

StevenCTimm commented 1 year ago

OF the above list the only ones currently showing errors are the imperial ones. Will open a ticket there shortly.

StevenCTimm commented 1 year ago

so we still have to check on CMSHTPC_T2_UK_SGrid_Bristol_lcgce02 and HCC_US_Michigan_gate02

also Edita rightly said the JINR site won't work from the OSG factory, even if it is there. Also DUNE_CA_VICTORIA wasn't copied back to gfactory-2, have filed a different issue for that.

StevenCTimm commented 1 year ago

Tickets all in progress. Bristol came OK on its own. Victoria is working on adding SCITOKEN support to their non-traditional condor setup (it's not a htcondor-ce), Michigan has received the ticket. JINR site now OK

StevenCTimm commented 1 year ago

The only one left from this list is Victoria and there's a separate issue in the tracker open on that, so closing this.