Open garlick opened 2 years ago
Maybe we could have flux top watch for certain kinds of activity in the job manager journal instead?
Is the journal accessible by guests?
I thought you had an idea for a multi-response RPC for job-list
which would only reply on updates. That might be a bit challenging to implement, though.
One challenge for flux top as it is currently implemented is that it only queries job-list after job state change events are published.
Would it be so bad to just query job-list every N seconds for now until a better solution is implemented?
Good point about journal permission!
I edited my description to include the job-list
idea concurrently with your comment. Sorry about that.
Would it be so bad to just query job-list every N seconds for now until a better solution is implemented?
Yeah that would probably be fine for a first cut.
A use case for
flux top
brought up by AHA Moles team was monitoring job ensembles for CPU utilization, as an aid to tuning machine learning jobs.This could be collected at the shell plugin level, with the rank 0 shell periodically posting an aggregate number as a job memo, which could then be accessed by
flux jobs
andflux top
. The sample interval could default to some long period like a minute, and be tunable by shell option.One challenge for
flux top
as it is currently implemented is that it only queriesjob-list
after job state change events are published. Maybe we could haveflux top
watch for certain kinds of activity in the job manager journal instead? Or maybejob-list
could provide a specialized streaming RPC for job monitoring tools.