ESMCI / ccs_config_cesm

CESM CIME Case Control System configuration files
3 stars 44 forks source link

128-core Derecho jobs should go on main queue, not develop #182

Closed samsrabin closed 3 days ago

samsrabin commented 1 month ago

In CTSM we've noticed a lot of our test jobs waiting in the cpudev queue for a while. I asked in the derecho-users Slack, and Ben Kirk replied:

Further, each user has access to 256 max cores at a time IIRC.

looks like your small jobs are requesting one full node - 128 cores? This will only allow you to run 2 max at the same time, but more importantly could get a full node in main if that’s what they require.

I moved those jobs to main (qmove main <JOBID>) and they got going really quickly. But it seems like it'd be better for these to be on the main queue by default.

samsrabin commented 1 month ago

This happens despite jobmax="64" in the specification for develop in config_batch.xml: https://github.com/ESMCI/ccs_config_cesm/blob/e2a542212c4a1aff5ac7d55544ecc6eb8c495c93/machines/derecho/config_batch.xml#L14-L17

Maybe it would be better to specify main with jobmin 65 and jobmax 2488x128 = 318464?