flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
167 stars 50 forks source link

t1105-proxy.t fails with extra subscription messages #2522

Closed SteVwonder closed 5 years ago

SteVwonder commented 5 years ago

Another non-deterministic failure. I believe the race here is that if the second flux event sub -c1 hb is slow to run, then the count of the subscribe logs is 2 rather than the expected 1. I wonder if this could be tweaked to subscribe to an event like foobar that is then published via flux event pub as part of the test. Kinda hard to ensure that the sub has been registered before sending the pub though.

test log:

not ok 5 - flux-proxy manages event redistribution
FAIL: t1105-proxy.t 5 - flux-proxy manages event redistribution
#       
#               flux proxy $TEST_URI \
#                 "flux event sub -c1 hb& flux event sub -c1 hb& wait;wait" &&
#               FLUX_URI=$TEST_URI flux dmesg | sed -e "s/[^ ]* //" >event.out &&
#               test $(egrep "connector-local.*debug\[0\]: subscribe hb" event.out|wc -l) -eq 1 &&
#               test $(egrep "proxy.*debug\[0\]: subscribe hb" event.out|wc -l) -eq 1 &&
#               test $(egrep "connector-local.*debug\[0\]: unsubscribe hb" event.out|wc -l) -eq 1 &&
#               test $(egrep "proxy.*debug\[0\]: unsubscribe hb" event.out|wc -l) -eq 1
#       
ok 6 - flux-proxy permits dynamic service registration

event.out content in trash-directory:

broker.debug[0]: insmod connector-local
broker.info[0]: wireup: 4/4 (complete) 0.0s
broker.info[0]: Run level 1 starting
broker.debug[0]: insmod barrier
broker.debug[0]: insmod content-sqlite
broker.debug[0]: content backing store: enabled content-sqlite
broker.debug[0]: insmod kvs
broker.debug[0]: insmod kvs-watch
broker.debug[0]: insmod job-info
broker.debug[0]: insmod aggregator
broker.debug[0]: insmod cron
cron.info[0]: synchronizing cron tasks to event hb
aggregator.debug[0]: push: resource.hwloc.reload:0-14120: count=1 fwd_count=4 total=4
aggregator.debug[0]: push: resource.hwloc.reload:0-14120: count=3 fwd_count=4 total=4
aggregator.debug[0]: push: resource.hwloc.reload:0-14120: count=4 fwd_count=4 total=4
aggregator.debug[0]: sink: resource.hwloc.reload:0-14120: count=4 total=4
broker.debug[0]: insmod userdb
broker.debug[0]: insmod job-ingest
broker.debug[0]: insmod job-manager
job-manager.debug[0]: restart_from_kvs: added 0 jobs
broker.debug[0]: insmod job-exec
broker.debug[0]: insmod sched-simple
sched-simple.debug[0]: service_register
job-manager.debug[0]: scheduler: hello
job-manager.debug[0]: scheduler: ready single
sched-simple.debug[0]: ready: 32 of 32 cores: rank[0-3]/core[0-7]
broker.info[0]: rc1: running /home/sherbein/Repositories/flux-framework/flux-core/etc/rc1.d/01-enclosing-instance
broker.info[0]: Run level 1 Exited (rc=0) 12.3s
broker.info[0]: Run level 2 starting
broker.info[0]: runlevel 2 (300.0s) timer started
connector-local.debug[0]: subscribe hb
proxy.debug[0]: subscribe hb
connector-local.debug[0]: unsubscribe hb
proxy.debug[0]: unsubscribe hb
connector-local.debug[0]: subscribe hb
proxy.debug[0]: subscribe hb
connector-local.debug[0]: unsubscribe hb
proxy.debug[0]: unsubscribe hb
grondo commented 5 years ago

Kinda hard to ensure that the sub has been registered before sending the pub though.

This is what t/scripts/event-trace.lua is meant to help with. It doesn't run its COMMAND until the subscribe is guaranteed to be active. (If it is still working properly, pretty old script)