Closed garlick closed 11 months ago
Problem: The first test fails with the following stack trace, which seems to indicate a user-after-free on message in notify_shell_cb()
message
notify_shell_cb()
$ ./t0006-notify.t -v expecting success: run_timeout 30 flux run \ ${NOTIFY} --status=69 2>warn.err && grep event-status=69 warn.err not ok 1 - 1n1p event notify triggers warning on stderr # # run_timeout 30 flux run \ # ${NOTIFY} --status=69 2>warn.err && # grep event-status=69 warn.err # expecting success: run_timeout 30 flux run \ ${NOTIFY} --status=69 --message="lorem ipsum" 2>message.err && grep "lorem ipsum" message.err 0.235s: flux-shell[0]: WARN: pmix: notify source=f2swcE4f.0 event-status=69 lorem ipsum ok 2 - 1n1p event notify with message works # failed 1 among 2 test(s) 1..2 Oct 03 18:09:41.842845 broker.err[0]: rc2.0: sh ./t0006-notify.t --verbose Exited (rc=1) 2.9s flux-start: 0 (pid 2273874) exited with rc=1 $ cat trash*/warn.err flux-job: task(s) Segmentation fault $
gdb backtrace snippet:
#6 0x000000558935f508 in flux_shell_log (component=0x20033a4498 "pmix", level=4, file=0x20033a4488 "notify.c", line=86, fmt=<optimized out>) at log.c:201 buf = "notify source=f2Hjxo1H.0 event-status=69 \000\000\000\000\000\000\000\030\354\301\356\177\000\000\000\000\353\301\356\177\000\000\000\000\300\060\000 \000\000\000\360%\016\241U\000\000\000`)\027\241U\000\000\000\060\353\301\356\177\000\000\000\250\363\065\211U\000\000\000\000 ;\211U\000\000\000\350,;\211U", '\000' <repeats 11 times>, "\030\354\301\356\177\000\000\000\000>:\003 \000\000\000h\000\000\000\000\000\000\000\000\353\301\356\177\000\000\000\000\300\060\000 \000\000\000\001", '\000' <repeats 15 times>, "\020>:\003 \000\000\000\000\036m\217\354B\375\240"... ap = {__stack = 0x7feec1fae0, __gr_top = 0x7feec1fae0, __vr_top = 0x7feec1fac0, __gr_offs = -24, __vr_offs = -128} #7 0x00000020033a213c in notify_shell_cb () from /nfshome/garlick/proj/flux-pmix/src/shell/plugins/.libs/pmix.so No symbol table info available. #8 0x000000200339ff24 in interthread_recv () from /nfshome/garlick/proj/flux-pmix/src/shell/plugins/.libs/pmix.so No symbol table info available. #9 0x0000002000337814 in ev_invoke_pending (loop=0x200038d440 <default_loop_struct>) at ev.c:3770 p = <optimized out>
Note this is on the working branch for #90
Problem: The first test fails with the following stack trace, which seems to indicate a user-after-free on
message
innotify_shell_cb()
gdb backtrace snippet:
Note this is on the working branch for #90