Closed grondo closed 4 months ago
Yep, I think you found it. Try this:
diff --git a/src/modules/sched-simple/sched.c b/src/modules/sched-simple/sched.c
index 0f08af251..614206d42 100644
--- a/src/modules/sched-simple/sched.c
+++ b/src/modules/sched-simple/sched.c
@@ -292,9 +292,6 @@ static void annotate_reason_pending (struct simple_sched *ss)
{
int jobs_ahead = 0;
- if (!flux_module_debug_test (ss->h, DEBUG_ANNOTATE_REASON_PENDING, false))
- return;
-
struct jobreq *job = zlistx_first (ss->queue);
while (job) {
if (schedutil_alloc_respond_annotate_pack (ss->util_ctx,
@@ -326,17 +323,17 @@ static void check_cb (flux_reactor_t *r, flux_watcher_t *w,
int revents, void *arg)
{
struct simple_sched *ss = arg;
- flux_watcher_stop (ss->idle);
+ //flux_watcher_stop (ss->idle);
/* See if we can fulfill alloc for a pending job
* If current head of queue can't be allocated, stop the prep
* watcher, i.e. block. O/w, retry on next loop.
*/
if (try_alloc (ss->h, ss) < 0 && errno == ENOSPC) {
- annotate_reason_pending (ss);
- flux_watcher_stop (ss->prep);
- flux_watcher_stop (ss->check);
+ //flux_watcher_stop (ss->prep);
+ //flux_watcher_stop (ss->check);
}
+ annotate_reason_pending (ss);
}
static int try_free (flux_t *h, struct simple_sched *ss, json_t *R)
then
flux start
flux resource drain 0
flux submit hostname
watch the RSS
Oh just saw in the other bug that you made more progress. Sorry for the noise!
No, this is good, thanks!
I think we're fairly confident this is the source of the RSS growth in the rank 0 broker on elcapi. Since the broker was actually OOM killed twice in one day on that system, this is a pretty high priority fix. @milroy or @trws, if you have a band-aid patch, post it here and I or @jameshcorbett can patch the RPM and we can get this fix onto the system asap.
Note that there is still something to fix in flux-core as noted in flux-framework/flux-core#6114. We should not be storing all annotation updates in memory for perpetuity.
This patch should prevent the extra annotation updates:
diff --git a/qmanager/modules/qmanager_callbacks.cpp b/qmanager/modules/qmanager_callbacks.cpp
index 45bcbf0a..b3823fd3 100644
--- a/qmanager/modules/qmanager_callbacks.cpp
+++ b/qmanager/modules/qmanager_callbacks.cpp
@@ -97,6 +97,7 @@ int qmanager_cb_t::post_sched_loop (
// if old_at == at, then no reason to send this annotation again.
if (job->schedule.at == job->schedule.old_at)
continue;
+ job->schedule.old_at = job->schedule.at;
if (schedutil_alloc_respond_annotate_pack (schedutil,
job->msg,
"{ s:{s:f} }",
(Edited to remove unnecessary assignment.) @trws is more familiar with this code, though.
Thanks, once we get @trws ACK we can patch the current RPM and issue a new release (-2). @jameshcorbett, if you're not available to do this I can take care of it.
Also, the credit for the above patch goes to @trws since he recognized the fix and proposed the change.
Another idea would be to just comment out the annotation for now (they aren't critical and this would workaround the RSS growth until a proper fix is available.)
Thanks, once we get @trws ACK we can patch the current RPM and issue a new release (-2). @jameshcorbett, if you're not available to do this I can take care of it.
I'm around and available to do it!
Another idea would be to just comment out the annotation for now
I just ran that through the testsuite and it will also work.
Update: the testsuite fails the annotation test when the annotation response is commented out, but I don't think that will impact necessary functionality. It's probably safer to just use the patch above.
As noted in flux-framework/flux-core#6114, we recently discovered that a possible contributing factor to rank 0 broker RSS growth is massive amounts of annotation updates from Fluxion for jobs. Speaking with @trws and @milroy earlier today, it appeared that they discovered a possible bug in qmanager that could send unnecessary annotation updates for all jobs on each check callback (so on each event loop iteration).