flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
159 stars 49 forks source link

broker: need more useful progress indication when starting a large instance #5880

Open garlick opened 2 months ago

garlick commented 2 months ago

Problem: need better feedback to users when brokers are slow to start up in a big instance (like a large flux alloc).

If not all the brokers enter the PMI barrier, there is no feedback. To reproduce, run

$ flux start -s 64 --test-start-mode=leader
[wait forever]
^Cflux-broker: simple: barrier: operation failed
flux-broker: bootstrap failed

If a node completes PMI bootstrap but then fails to wire up, messages like this appear every 5s

$ flux start -s 64 -o,-Sbroker.quorum-timeout=10s
Apr 10 19:33:12.898135 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:33:22.899014 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:33:32.900086 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:33:42.901166 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:33:52.901839 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:34:02.902736 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:34:12.903074 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:34:22.903295 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:34:32.903458 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:34:42.903722 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:34:52.904452 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:35:02.905143 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:35:03.278572 broker.err[0]: quorum reached

To reproduce that I added the following patch

diff --git a/src/broker/broker.c b/src/broker/broker.c
index 971b48732..53515fc54 100644
--- a/src/broker/broker.c
+++ b/src/broker/broker.c
@@ -404,6 +404,9 @@ int main (int argc, char *argv[])
                   flux_reactor_now (ctx.reactor) - ctx.starttime);
     }

+    if (ctx.rank == 63)
+        sleep (120);
+
     // Setup profiling
     setup_profiling (argv[0], ctx.rank);
garlick commented 2 months ago

When this came up before I was playing with a broker --progress option (on my broker_progress branch which has one commit, 36ff1e2a4285636641294b85d46c6b1091d4ba7c).

It prints:

flux-broker: waiting for remaining brokers to join: 63 of 64

with the numbers rewritten in place. That works in addition to the "quorum delayed" message described above which call out the missing hostnames every 5s.

I stalled out on this before because I wasn't really sure how to integrate this into the overall system.

grondo commented 2 months ago

One idea is to expose the quorum progress via an RPC, then responsibility for indicating progress can be handled by flux job attach (or any other tool that is interested). The tool can open a handle to the instance as soon as the uri attribute is posted to the eventlog and monitor progress. This is currently how flux alloc --bg works, but it just monitors state-machine.wait