flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
167 stars 50 forks source link

Suggestion: sort flux resource list based on unique queues #6297

Open kkier opened 1 month ago

kkier commented 1 month ago

Current design of flux resource list could be described as “we will list nodes only once and we will therefore list queues multiple times in different combinations.” The degenerate case is that with three queues we could have seven lines times three (free/allocated/down) states - 21 lines of output. We also end up with arbitrarily long QUEUE columns, which has been discussed elsewhere:

     STATE QUEUE      NNODES   NCORES    NGPUS NODELIST
      free pdebug,pal      1      111       11 tuolumne[blah]
      free pbatch,pal     11      111      111 tuolumne[blah]
 allocated pdebug,pal     11      111      111 tuolumne[blah]
 allocated pbatch,pal     11      111      111 tuolumne[blah]
 allocated pbatch,pal     11      111      111 tuolumne[blah]
      down pbatch,pal     11      111      111 tuolumne[blah]
      down pbatch,pal     11      111      111 tuolumne[blah]

My idea - make the queue the secondary distinguisher in the sort, and accept listing nodes multiple times, e.g.:

     STATE QUEUE      NNODES   NCORES    NGPUS NODELIST
      free pdebug          1      111       11 tuolumne[blah]
      free pbatch         11      111       11 tuolumne[blah]
      free random         11      111       11 tuolumne[blah]
      free pall           11      111       11 tuolumne[blah]
 allocated pdebug         11      111      111 tuolumne[blah]
 allocated pbatch         11      111      111 tuolumne[blah]
 allocated random         11      111      111 tuolumne[blah]
 allocated pall           11      111      111 tuolumne[blah]
      down pdebug         11      111      111 tuolumne[blah]
      down pbatch         11      111      111 tuolumne[blah]
      down random         11      111      111 tuolumne[blah]
      down pall           11      111      111 tuolumne[blah]

A further extension - this makes it very natural to filter when people specify a queue. Instead of:

     STATE QUEUE      NNODES   NCORES    NGPUS NODELIST
      free pbatch,pal     11      111      111 tuolumne[blah]
      free pbatch,pal     11      111      111 tuolumne[blah]
 allocated pdebug,pal     11      111      111 tuolumne[blah]
 allocated pbatch,pal     11      111      111 tuolumne[blah]
 allocated pbatch,pal     11      111      111 tuolumne[blah]
      down pbatch,pal     11      111      111 tuolumne[blah]
      down pbatch,pal     11      111      111 tuolumne[blah]

...you would get:

     STATE QUEUE      NNODES   NCORES    NGPUS NODELIST
      free pall           11      111      111 tuolumne[blah]
 allocated pall           11      111      111 tuolumne[blah]
      down pall           11      111      111 tuolumne[blah]

If we can also hide lines with 1 nodes, we’re further cleaning up the output. In a system with all nodes in one state you'd just see something like:

     STATE QUEUE      NNODES   NCORES    NGPUS NODELIST
      free pall           11       11      111 tuolumne[blah]
grondo commented 1 month ago

I'm going to look into this and figure out how much work it might be.

Current design of flux resource list could be described as “we will list nodes only once and we will therefore list queues multiple times in different combinations.”

flux resource list is listing "resource sets" in each state. It isn't that it is only listing nodes once, but that a node can only be in one state at time. Resource sets are assigned properties which map them to a queue, not the other way around, so this is what the current output is reflecting. (i.e. it is not a tool that lists resources per-queue, since organizing resources into queues is optional in flux, but a tool that lists resources sets in their various states, along with other data)

I gather this is only an issue if queues overlap. Obviously, it is a bit annoying if every cluster is going to have a pall queue that overlaps with everything. One interim solution might be to suppress the display of this kind of "everything" queue. Is the pall queue currently defined using an "all" property? If so, one solution might be to just remove the requires from the pall queue, which would have the side effect of hiding it in flux resource list output (we should test that though)

Another idea would be to suppress queues that are currently disabled in flux resource list output. Not sure if that would help here.

If we can also hide lines with 1 nodes, we’re further cleaning up the output.

What would flux resource list -s free display when there are no free nodes? Should flux resource list -no '{state} {nnodes}' also suppress lines with 0 nodes? It seems in this case the user might explicitly want to see free 0?

grondo commented 1 month ago

@kkier - in some ways this request is contrary to your request in #6275 (the expandable width format field has already been implemented by the way)

Has your thinking changed and you'd like to split identical lines and show each queue separatebly, or do you still want to keep #6275 open?

kkier commented 1 month ago

@kkier - in some ways this request is contrary to your request in #6275 (the expandable width format field has already been implemented by the way)

Has your thinking changed and you'd like to split identical lines and show each queue separatebly, or do you still want to keep #6275 open?

This is a good point and I think this idea ends up being better overall than #6275 in a few ways. Thank you for reminding me.

grondo commented 1 month ago

Ok. However, to implement this suggestion is going to end up being a rewrite of the flux resource list command. Some of the other suggestions here could be done first, though, if that sounds good:

kkier commented 1 month ago

If those are lower-hanging fruit, for sure. None of this is immediate needs, obviously, just things to make the UI a little easier to use and maybe make more sense for people new to the paradigm.