dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
46 stars 107 forks source link

Configure T0 WMAgent to use ResourceControlUpdater #12121

Open germanfgv opened 1 week ago

germanfgv commented 1 week ago

Impact of the new feature T0 needs to enable ResourceControlUpdater. AFAIK this component updates RC tables according to sites CRIC status, preventing T0 from submitting jobs to draining sites. We want to understand the features of the component so we can use it properly and request new features if needed.

Is your feature request related to a problem? Please describe. T0 WMAgents do not use the ResourceControlUpdater thread of AgentStatusWatcher. Tier-0 deployment procedure adds each site to ResourceControl individually here. Adding other sites T1 requires manual intervention from operators. Now that T0 regularly uses several T1 sites, it is necessary that T0 agents can react to changes in the sites status.

Describe the solution you'd like Enable ResourceControlUpdater so it reacts to CRIC status for T1 sites, but still allow us to closely control available slots at T1_CH_CERN

Describe alternatives you've considered Develop a separate service for T0, but this option seems wasteful.

amaltaro commented 1 day ago

@germanfgv @LinaresToine can you please clarify the following: 1) which resources would you like to have added in the T0 agent? All T1s? Any T2 other than CERN? 2) in addition, it looks like you also add the following: T0_CH_CERN_Disk, T2_CH_CERN, T2_CH_CERN_P5 3) please point me to the script you use to deploy the T0 agents and/or create these resources 4) are you manually disabling AgentStatusWatcher? Or do you change its configuration to False in the deployment script?

Thanks!

germanfgv commented 20 hours ago
  1. which resources would you like to have added in the T0 agent? All T1s? Any T2 other than CERN?

We use T2_CH_CERN, T2_CH_CERN_P5 + selected T1 sites for preocessing. We use T0_CH_CERN_Disk as storage node.

  1. in addition, it looks like you also add the following: T0_CH_CERN_Disk, T2_CH_CERN, T2_CH_CERN_P5

See previous answer.

  1. please point me to the script you use to deploy the T0 agents and/or create these resources

That happen in 2 different places

  1. are you manually disabling AgentStatusWatcher? Or do you change its configuration to False in the deployment script?

We set that parameter to False in the default Tier0Anget config: https://github.com/dmwm/T0/blob/master/etc/Tier0Config.py We merge this default config with the one you linked during deployment