equinor / everest

GNU General Public License v3.0
4 stars 5 forks source link

Missing control over queues when submitting to LSF #14

Closed berland closed 1 month ago

berland commented 1 month ago

Issue everserver and the actual jobs submitted to an LSF cluster are distributed over different queues onprem, this does not seem right:

JOBID   USER    STAT  QUEUE  FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
831306  f_scout RUN   mr         st-vgrid02  st-rsv19-19 everserver May 28 09:59
831356  f_scout RUN   normal     st-rsv19-19 st-rsa00-00 EGG        May 28 09:59

Definition of done everserver and EGG jobs end up in the same queue given that that is the intention.

DanSava commented 1 month ago

Reproduced the behaviour without setting queue name in the simulator config setting

simulator:
  queue_system: lsf
  cores: 100
  max_runtime: 10000
  resubmit_limit: 0

The user-recommended config is the following:

simulator:
  name: mr
  queue_system: lsf
....

When setting the queue name to:

simulator:
  name: mr

All jobs are distributed on the same queue onprem

JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
999729  dsav    RUN   mr         st-vgrid02  st-rsv18-18 everserver May 29 09:30
999789  dsav    RUN   mr         st-rsv18-18 st-rsa00-00 EGG        May 29 09:30