Problem: @grondo investigated a high load a rank 0 broker on a test cluster. A flux overlay trace revealed a steady stream of
[ +8.042307] tx * c disconnect 0 [0]
[ +8.042326] tx * c disconnect 0 [0]
[ +8.042524] tx * c disconnect 0 [0]
[ +8.042551] tx * c disconnect 0 [0]
[ +8.042578] tx * c disconnect 0 [0]
[ +8.042761] tx * c disconnect 0 [0]
[ +8.042788] tx * c disconnect 0 [0]
[ +8.043487] tx * c disconnect 0 [0]
[ +8.043515] tx * c disconnect 0 [0]
[ +8.043542] tx * c disconnect 0 [0]
A stack trace of the spinning broker revealed
(gdb) where
#0 __GI___libc_write (nbytes=8, buf=0xffffeda873e0, fd=<optimized out>)
at ../sysdeps/unix/sysv/linux/write.c:26
#1 __GI___libc_write (fd=<optimized out>, buf=0xffffeda873e0, nbytes=8)
at ../sysdeps/unix/sysv/linux/write.c:24
#2 0x0000ffff895ae8dc in ?? () from /lib/aarch64-linux-gnu/libzmq.so.5
#3 0x0000ffff895a56ec in ?? () from /lib/aarch64-linux-gnu/libzmq.so.5
#4 0x0000ffff895aba8c in ?? () from /lib/aarch64-linux-gnu/libzmq.so.5
#5 0x0000ffff895b5730 in ?? () from /lib/aarch64-linux-gnu/libzmq.so.5
#6 0x0000ffff895cb3f4 in zmq_send () from /lib/aarch64-linux-gnu/libzmq.so.5
#7 0x0000aaaac5ac7008 in zmqutil_msg_send_ex (sock=0xaaaad2d42a80,
msg=0xaaaad2dfbe60, nonblock=<optimized out>)
at ../common/libzmqutil/msg_zsock.c:52
#8 0x0000aaaac5ab5fe8 in overlay_sendmsg_child (ov=0xaaaad2d39180,
msg=0xaaaad2dfbe60) at ./src/broker/overlay.c:805
#9 0x0000aaaac5ae6ad8 in overlay_control_child.constprop.0 (
ov=0xaaaad2d39180,
uuid=0xaaaad2e1dfc0 "f982f794-27d4-464b-88f0-f41976ffdf24", status=0,
type=CONTROL_DISCONNECT) at ./src/broker/overlay.c:568
#10 0x0000aaaac5ab77e8 in child_cb (r=<optimized out>, w=<optimized out>,
revents=<optimized out>, arg=0xaaaad2d39180) at ./src/broker/overlay.c:1041
#11 0x0000aaaac5ac54a8 in check_cb (loop=0xffff896b24d8 <default_loop_struct>,
w=0xaaaad2d43f08, revents=<optimized out>)
at ../common/libzmqutil/ev_zmq.c:79
#12 0x0000ffff89676504 in ev_invoke_pending (
loop=0xffff896b24d8 <default_loop_struct>) at libev/ev.c:3770
#13 0x0000ffff8964f044 in ev_run (flags=0, loop=<optimized out>)
at libev/ev.c:4190
#14 ev_run (flags=0, loop=<optimized out>) at libev/ev.c:4021
#15 flux_reactor_run (r=0xaaaad2d30f10, flags=flags@entry=0)
at libflux/reactor.c:124
#16 0x0000aaaac5aadb08 in main (argc=<optimized out>, argv=<optimized out>)
at ./src/broker/broker.c:529
Problem: @grondo investigated a high load a rank 0 broker on a test cluster. A
flux overlay trace
revealed a steady stream ofA stack trace of the spinning broker revealed
There was a downrev broker in the system
That broker's logs were filled with
Stopping the downrev broker made the high load stop.
Restarting the broker did not make the high load return.