flux-framework / flux-sched

Fluxion Graph-based Scheduler
GNU Lesser General Public License v3.0
84 stars 39 forks source link

support `flux module stats` for fluxion resource module #1166

Closed grondo closed 1 month ago

grondo commented 1 month ago

Supporting the flux module stats RPC in sched-fluxion-resource may be a cheap way to expose some of the match states via an existing mechanism.

In today's meeting it was noted that there is already a "match stats" structure. All that would be needed here would be to convert these stats to JSON to add as the response payload. The RPC would have the topic: sched-fluxion-resource.stats-get, and stats should be cleared by the RPC sched-fluxion-resource.stats-clear

garlick commented 1 month ago

Couple of examples:

https://github.com/flux-framework/flux-core/blob/master/src/modules/content-sqlite/content-sqlite.c#L562

That one doesn't implement the stats-clear RPC. How about this one:

https://github.com/flux-framework/flux-core/blob/master/src/modules/kvs/kvs.c#L2319

In both of those cases, the handlers are registered in an htab struct using flux_msg_handler_addvec() which I see you're using in resource_match.cpp.

You can query the stats of any module using e.g. flux module stats e.g.

$ flux module stats job-manager | jq
{
  "journal": {
    "listeners": 1
  },
  "active_jobs": 0,
  "inactive_jobs": 0,
  "max_jobid": 0
}

If the module doesn't explicitly register handlers for those topic strings, then a default handler answers e.g.

$ flux module stats sched-simple | jq
{
  "tx": {
    "request": 11,
    "response": 1,
    "event": 0,
    "control": 0
  },
  "rx": {
    "request": 3,
    "response": 6,
    "event": 0,
    "control": 0
  }
}
milroy commented 1 month ago

It turns out there's already a callback for stats in the resource module: https://github.com/flux-framework/flux-sched/blob/c8e03f8416b3d7803c6108de5ea8f15292728568/resource/modules/resource_match.cpp#L2138

Which returns useful information:

$ flux python -c "import flux; print(flux.Flux().rpc('sched-fluxion-resource.stat').get())"
{'V': 1212417, 'E': 2424832, 'by_rank': {'[0-16383]': 74}, 'load-time': 4.984470336, 'njobs': 100, 'min-match': 0.006610583, 'max-match': 0.015389209, 'avg-match': 0.00807740582}

I should be able to adapt and extend the existing callback and implement sched-fluxion-resource.stats-clear pretty easily.

milroy commented 1 month ago

I should be able to adapt and extend the existing callback

In fact, I just need to change the RPC topic from sched-fluxion-resource.stat to sched-fluxion-resource.stats-get and it works:

$ flux module stats sched-fluxion-resource
{
 "V": 1212417,
 "E": 2424832,
 "by_rank": {
  "[0-16383]": 74
 },
 "load-time": 4.6429865860000001,
 "njobs": 0,
 "min-match": 0.0,
 "max-match": 0.0,
 "avg-match": 0.0
}