OSC / ood-activejobs

[MOVED] Active Jobs provides details of scheduled jobs on an HPC cluster.
https://osc.github.io/Open-OnDemand/
MIT License
0 stars 1 forks source link

For Slurm only display Reason #94

Closed nickjer closed 7 years ago

nickjer commented 7 years ago

For Slurm only a typical squeue call gives...

$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1325265 ember-gpu 2dbe-fre u0818159 PD       0:00     10 (Resources)
           1384708 facelli-e CUIDocIn u0499179 PD       0:00      1 (Resources)
           1386234     ember non90-py u6010484 PD       0:00      4 (Priority)
           1386233     ember non89-py u6010484 PD       0:00      4 (Priority)
           1386232     ember non88-py u6010484 PD       0:00      4 (Priority)
           1386231     ember non87-py u6010484 PD       0:00      4 (Resources)
           1386188     ember g09-modu u1021275 PD       0:00      1 (Priority)
           1386189     ember g09-modu u1021275 PD       0:00      1 (Priority)
           1386194     ember g09-modu u1021275 PD       0:00      1 (Priority)
           1386174     ember g09-modu u1021275 PD       0:00      1 (Priority)
           1386176     ember g09-modu u1021275 PD       0:00      1 (Priority)
           1386179     ember g09-modu u1021275 PD       0:00      1 (Priority)
           1386180     ember g09-modu u1021275 PD       0:00      1 (Priority)
           1386184     ember g09-modu u1021275 PD       0:00      1 (Priority)
           1386170     ember g09-modu u1021275 PD       0:00      1 (Priority)
           1386169     ember g09-modu u1021275 PD       0:00      1 (Priority)
           1387022     ember SmSCO+_M u0029299 PD       0:00      1 (Priority)
           1387023     ember SSmCO+_M u0029299 PD       0:00      1 (Priority)
           1387086     ember SmSCO+_M u0029299 PD       0:00      1 (Priority)
           1385828    usu-em inv_1024 u6007591  R   10:45:14      5 em[150-154]
           1387044     ember    EM277 u6006065  R      24:12      4 em[085,105,122,124]
           1384671 facelli-e CUIDocIn u0499179  R    6:41:22      1 em395
           1384155     ember rng16.sl u1016705  R   21:49:32      1 em078
           1383890     ember OGdCO_8A u0985813  R 1-04:25:43      1 em144
           1386168     ember g09-modu u1021275  R      38:16      1 em134
           1386230     ember non86-py u6010484  R      59:33      4 em[021,119-121]
           1384978     ember rng16.sl u1016705  R   14:13:14      1 em083
           1384616     ember    ring5 u0648283  R   18:58:46      5 em[107-108,110-112]
           1386163     ember g09-modu u1021275  R    3:38:39      1 em103
           1386164     ember g09-modu u1021275  R    3:38:39      1 em118
           1384758     ember 13.slurm u6004335  R   18:16:14      1 em086
           1384759     ember 15.slurm u6004335  R   18:16:14      1 em087
           1384760     ember 16.slurm u6004335  R   18:16:14      1 em088

Note that the REASON is given when a job is pending (queued). So this should be a second-class (or very near first-class) citizen when displaying extended attributes about a job.

You can read more about the REASON codes here:

https://slurm.schedmd.com/squeue.html#lbAF

brianmcmichael commented 7 years ago

We need to come up with a default view for all HPC apps, or bite the bullet and create separate views for each type of adapter output.

I'm in favor of a uniform display across all installations of this app, for maintainability sake.

ericfranz commented 7 years ago

I have a proposal. We are already essentially "normalizing" both default and native attributes to named attributes. We can change the design of the objects we are normalizing to, so instead of an object with fixed attributes (Jobstatusdata) we normalize to an object that is an attribute list, or a list of attribute lists. This would allow the views to vary per adapter without adding additional work for each adapter.

I'll share code explaining my proposal.

brianmcmichael commented 7 years ago

I'm skeptical. It's easier to normalize and test ruby code than it is to troubleshoot all of this on the frontend in JavaScript.

brianmcmichael commented 7 years ago

Do we need to mimic the exact output of each unique adapter's stat command?

activejobs is already an abstraction of the job information, and the existing view is itself an abstraction of what qstat provides. In this case, "reason" seems like additional data that we could display in the extended view, and maybe I'm not familiar enough with slurm, but the essential point of the message is to tell a user that the job is queued or queued_held. I think providing a uniform view across installations is IMHO preferred to adding all sorts of logic to display or hide options, especially since we're going to need to be testing this app on many different environments.

I think we should talk about design and planning for maintainability of this feature before I start coding this in.

brianmcmichael commented 7 years ago

It was decided that reason did not need to be added to the initial view.

It has been added to the extended view https://github.com/OSC/ood-activejobs/commit/0f6caf5a04d7fca04965d7f78163540c76c514de