OSC / ood_core

Open OnDemand core library
https://osc.github.io/ood_core/
MIT License
10 stars 29 forks source link

LSF Adapter: Verify job arrays do not produce problems for My Jobs and Active Jobs using the adapter #18

Open ericfranz opened 7 years ago

ericfranz commented 7 years ago

┆Issue is synchronized with this Asana task by Unito

ericfranz commented 7 years ago

Currently Active jobs handles jobs with job arrays without an issue.

If you try to create a job array through My Jobs, or try to use ood_core to get the status of a job array, there will be problems because bjobs $JOBID, when the id is a job array, returns multiple rows, instead of a single row:

[efranz@somehost02 ~]$ bjobs 554997
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
554997  efranz  DONE  short      fromhost.osc compute030  *oarray[7] Apr 19 17:01
554997  efranz  DONE  short      fromhost.osc compute030  *oarray[5] Apr 19 17:01
554997  efranz  DONE  short      fromhost.osc compute030  *oarray[3] Apr 19 17:01
554997  efranz  DONE  short      fromhost.osc compute030  *oarray[1] Apr 19 17:01
554997  efranz  DONE  short      fromhost.osc compute030  *oarray[9] Apr 19 17:01
ericfranz commented 7 years ago

It will probably cause issues if we try to create job arrays through My Jobs.

Currently for Active Jobs this is not an issue:

screen shot 2017-04-20 at 11 21 31 am

However, when we add LSF "extended attributes" that we show in the progressive disclosure pane in Active Jobs, we will run into a problem when calling adapter.info(jobid) because if we are trying to see the details of job 9 in the job array but we have jobid set to 554998 instead of 554998[9] then we won't get the Info object for the right job array.

Two possible solutions:

  1. Job arrays become first class citizens in the OodCore::Job::Adapter interface
  2. When calling bjobs, we scan for job arrays, and if we detect one (i.e. the job name ends with [#]) we alter the job id to append [#] to it i.e. so that for a job array instead of ID displaying 554998 5 times it would display 554998[1] and 554998[3] etc.:

    • screen shot 2017-04-20 at 11 39 57 am
    • Of course we would have to be careful that checking the status, deleting, etc. will all work for job array tasks in all states that would work for a normal job (as this is not the case for slurm).