flux-framework / flux-sched

Fluxion Graph-based Scheduler
GNU Lesser General Public License v3.0
84 stars 39 forks source link

reduce log verbosity #1185

Closed garlick closed 3 weeks ago

garlick commented 3 weeks ago

Problem: Fluxion logs are a bit verbose, reducing the overall signal to noise ratio of flux logs on a big system.

The big contributors are probably:

Alloc and free are logged for every job

[Apr22 15:08] sched-fluxion-qmanager[0]: alloc success (queue=compute id=364412878138114048)
[Apr22 15:14] sched-fluxion-qmanager[0]: free succeeded (queue=compute id=364412878138114048)

Resource status changes are logged:

[ +34.863811] sched-fluxion-resource[0]: resource status changed (rankset=[852] status=DOWN)
[Apr22 13:20] sched-fluxion-resource[0]: resource status changed (rankset=[103-104] status=UP)

Less prolific but perhaps not all that helpful

[Apr22 10:01] sched-fluxion-qmanager[0]: alloc canceled (id=361273921237942272 queue=compute)
[Apr22 13:40] sched-fluxion-qmanager[0]: alloc denied due to type="unsatisfiable" (id=364324453267537920 queue=compute)
grondo commented 3 weeks ago

Note that the resource status changes were somewhat helpful in tracking down recent bugs.

These messages would rarely be logged now that the flux resource list RPC is handled by the core resource module, but it doesn't seem useful to have log messages for handling any RPC, since success/failure is easily detected by the client itself:

[Apr22 15:29] sched-fluxion-resource[0]: status_request_cb: status succeeded
[  +0.000108] sched-fluxion-qmanager[0]: status_request_cb: resource-status succeeded