dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
46 stars 107 forks source link

Fix T1 site cores ResourceControl logic #12176

Open amaltaro opened 3 days ago

amaltaro commented 3 days ago

Fixes #12121

Status

not-tested

Description

Given that Tier0 configuration sets this value <=100% (e.g. 12.5), the integer division would always return 0, hence not changing any of the default thresholds.

With the current change, we can now properly calculate a percentage of the site slots (e.g. 1250 for CNAF).

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

It has to go together with this T0 PR: https://github.com/dmwm/T0/pull/5007

External dependencies / deployment changes

None

dmwm-bot commented 3 days ago

Jenkins results:

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/93/artifact/artifacts/PullRequestReport.html

mapellidario commented 3 days ago

Sorry Alan, I fail to catch the rationale for this change.

Could you provide some example values for infoSSB[site]['slotsCPU'] before being updatesd and self.t1SitesCores ? is self.t1SitesCores set to an integer in the range [0, 100] and nobody noticed before that self.t1SitesCores // 100 always return 0? Which value should be <= 100%?

amaltaro commented 3 days ago

To be on the safe side, I am now casting the result to an integer.

Dario, no problem! Yes, the problem is that that division always return 0. I expected the multiplication to take precedence in that expression, but it looks like division goes first, bringing the result to 0, see:

>>> slotsCPU * t1SitesCores // 100
1250.0
>>> slotsCPU * (t1SitesCores // 100)
0.0

So I could actually have kept the integer division and simply enforce the precedence (plus casting, to be on the safe side, given that resource slots are integer).

dmwm-bot commented 3 days ago

Jenkins results:

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/97/artifact/artifacts/PullRequestReport.html