Closed trws closed 1 week ago
Before merging you may want to expand the PR title here since the title is typically what goes in the merge commit and is then propagated to the release notes.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 74.0%. Comparing base (
36994ee
) to head (7c4ac8a
).:exclamation: Current head 7c4ac8a differs from pull request most recent head e070327
Please upload reports for the commit e070327 to get more accurate results.
This is stacked on top of #1203, and is a performance enhancement. For large numbers of constraint jobs especially out_edges was becoming a bottleneck. Using this test script I saw a drop from ~7 minutes 30 seconds to 90 seconds on my laptop and a very significant shift in the weights in the perf output.
Test script
```bash #!/usr/bin/env bash # test_description=' ' . `dirname $0`/sharness.sh export TEST_UNDER_FLUX_QUORUM=1 export TEST_UNDER_FLUX_START_MODE=leader rpc() { flux python -c \ "import flux, json; print(flux.Flux().rpc(\"$1\").get_str())" } test_under_flux 16384 system test_expect_success 'unload sched-simple' ' flux module remove -f sched-simple ' test_expect_success 'update configuration' ' flux config load <<-'EOF' [[resource.config]] hosts = "fake[0-16383]" cores = "0-63" gpus = "0-3" [[resource.config]] hosts = "fake[0-9999]" properties = ["compute"] [[resource.config]] hosts = "fake[10000-16000]" properties = ["test"] [[resource.config]] hosts = "fake[16001-16383]" properties = ["debug"] [sched-fluxion-qmanager] queue-policy = "easy" [sched-fluxion-resource] match-policy = "firstnodex" prune-filters = "ALL:core,ALL:gpu,cluster:node,rack:node" match-format = "rv1_nosched" EOF ' test_expect_success 'reload resource with monitor-force-up' ' flux module reload -f resource noverify monitor-force-up ' test_expect_success 'drain a few nodes' ' flux resource drain 1-1000 test with drained nodes ' test_expect_success 'load fluxion modules' ' flux module load sched-fluxion-resource && flux module load sched-fluxion-qmanager ' test_expect_success 'wait for fluxion to be ready' ' time flux python -c \ "import flux, json; print(flux.Flux().rpc(\"sched.resource-status\").get_str())" ' test_expect_success 'create a set of 10 inactive jobs' ' flux submit --cc=1-1000 --quiet \ -N 1 --exclusive \ --requires="host:fake[1005]" \ --progress --jps \ --setattr=exec.test.run_duration=0.01s \ hostname ' test_expect_success 'create a set of 2 running jobs' ' time flux submit --progress --jps --quiet --cc=1-2 --wait-event=start -N1 \ --requires=compute \ --setattr=exec.test.run_duration=5m hostname ' test_expect_success 'get match stats' ' flux jobs -Ano "{id} {duration}" && \ flux resource undrain 1-1000 && \ flux jobs -Ano "{id} {duration}" && \ sleep 10 && \ flux dmesg && \ flux jobs -a && \ rpc sched-fluxion-resource.stats-get | jq ' test_expect_success 'unload fluxion' ' flux module remove sched-fluxion-qmanager && flux module remove sched-fluxion-resource && flux module load sched-simple ' test_done ```