DUNE / dist-comp

Action items for DUNE distributed computing, and common scripts that are used.
2 stars 0 forks source link

Consult glideinwms developers on why US_FNAL-FermiGrid is ramping up so slow #129

Open StevenCTimm opened 8 months ago

StevenCTimm commented 8 months ago

Glideins are allocated to make even cores across all entries by default but on whole-node scheduling how does that work, how do they know how many cores they will be.

All the glideins that we get into Fermilab start pretty much immediately but the rate has not yet been high enough to grab the whole DUNE quota of 6000 slots. Tweaks may be possible on the factory side to submit more glideins in each round.

StevenCTimm commented 8 months ago

Believe this can be fixed by having glideinwms submit more than a single glidein per round, this is configurable. Will investigate more.

StevenCTimm commented 4 months ago

issue #160 is related to this as well. We don't ramp up so quickly when high memory jobs are involved.