glideinWMS / glideinwms

The glideinWMS Project
http://tinyurl.com/glideinwms
Apache License 2.0
16 stars 46 forks source link

Improve glidein requesting in cases where there is a high-memory job mix #405

Closed StevenCTimm closed 6 months ago

StevenCTimm commented 6 months ago

Is your feature request related to a problem? Please describe. DUNE typically runs with a 5.5GB RequestMemory. This often causes the frontend to believe that we have many usable cores when in fact we don't because there is not enough memory to match another job. Thus not enough glideins are requested.. to the extreme that we can have 12K jobs in the queue but only 2400 jobs running.

Describe the solution you'd like We would like to make the minimum free memory per glidein configurable. Currently it is hard-wired to 2500 at this line of code. https://github.com/glideinWMS/glideinwms/blob/01d534e9467a5f4496ba2828b902490f6966be99/frontend/glideinFrontendLib.py#L811 If this is configurable then we could change the configuration to adjust for different mixes of jobs. But in the near term we expect that all of our jobs will be high-memory through our current beam run. In general it might actually be nice to supply a configurable condor_status query in which the VO can determine for itself what slots are available and what slots are not. This could allow for factors other than memory--some remote sites are short on disk too and that can affect glidein occupancy as well.

Describe alternatives you've considered Marco has also suggested increasing the idle_vms_per_entry and idle_vms_total settings in the configuration and we are trying that first. if that doesn't work then we will hot-patch the line of code above.

Info (please complete the following information):

Additional context Add any other context or supporting files about the feature request here.

Eventually the same may need to be done for the Decision Engine which has similar but not identical logic.