flux-framework / flux-sched

Fluxion Graph-based Scheduler
GNU Lesser General Public License v3.0
84 stars 39 forks source link

avoid copying of std::map in out_edges #1206

Closed trws closed 1 week ago

trws commented 2 weeks ago

This is stacked on top of #1203, and is a performance enhancement. For large numbers of constraint jobs especially out_edges was becoming a bottleneck. Using this test script I saw a drop from ~7 minutes 30 seconds to 90 seconds on my laptop and a very significant shift in the weights in the perf output.

Test script ```bash #!/usr/bin/env bash # test_description=' ' . `dirname $0`/sharness.sh export TEST_UNDER_FLUX_QUORUM=1 export TEST_UNDER_FLUX_START_MODE=leader rpc() { flux python -c \ "import flux, json; print(flux.Flux().rpc(\"$1\").get_str())" } test_under_flux 16384 system test_expect_success 'unload sched-simple' ' flux module remove -f sched-simple ' test_expect_success 'update configuration' ' flux config load <<-'EOF' [[resource.config]] hosts = "fake[0-16383]" cores = "0-63" gpus = "0-3" [[resource.config]] hosts = "fake[0-9999]" properties = ["compute"] [[resource.config]] hosts = "fake[10000-16000]" properties = ["test"] [[resource.config]] hosts = "fake[16001-16383]" properties = ["debug"] [sched-fluxion-qmanager] queue-policy = "easy" [sched-fluxion-resource] match-policy = "firstnodex" prune-filters = "ALL:core,ALL:gpu,cluster:node,rack:node" match-format = "rv1_nosched" EOF ' test_expect_success 'reload resource with monitor-force-up' ' flux module reload -f resource noverify monitor-force-up ' test_expect_success 'drain a few nodes' ' flux resource drain 1-1000 test with drained nodes ' test_expect_success 'load fluxion modules' ' flux module load sched-fluxion-resource && flux module load sched-fluxion-qmanager ' test_expect_success 'wait for fluxion to be ready' ' time flux python -c \ "import flux, json; print(flux.Flux().rpc(\"sched.resource-status\").get_str())" ' test_expect_success 'create a set of 10 inactive jobs' ' flux submit --cc=1-1000 --quiet \ -N 1 --exclusive \ --requires="host:fake[1005]" \ --progress --jps \ --setattr=exec.test.run_duration=0.01s \ hostname ' test_expect_success 'create a set of 2 running jobs' ' time flux submit --progress --jps --quiet --cc=1-2 --wait-event=start -N1 \ --requires=compute \ --setattr=exec.test.run_duration=5m hostname ' test_expect_success 'get match stats' ' flux jobs -Ano "{id} {duration}" && \ flux resource undrain 1-1000 && \ flux jobs -Ano "{id} {duration}" && \ sleep 10 && \ flux dmesg && \ flux jobs -a && \ rpc sched-fluxion-resource.stats-get | jq ' test_expect_success 'unload fluxion' ' flux module remove sched-fluxion-qmanager && flux module remove sched-fluxion-resource && flux module load sched-simple ' test_done ```
grondo commented 1 week ago

Before merging you may want to expand the PR title here since the title is typically what goes in the merge commit and is then propagated to the release notes.

codecov[bot] commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 74.0%. Comparing base (36994ee) to head (7c4ac8a).

:exclamation: Current head 7c4ac8a differs from pull request most recent head e070327

Please upload reports for the commit e070327 to get more accurate results.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1206 +/- ## ======================================== - Coverage 74.0% 74.0% -0.1% ======================================== Files 102 102 Lines 14611 14608 -3 ======================================== - Hits 10822 10818 -4 - Misses 3789 3790 +1 ``` | [Files](https://app.codecov.io/gh/flux-framework/flux-sched/pull/1206?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flux-framework) | Coverage Δ | | |---|---|---| | [resource/schema/resource\_graph.hpp](https://app.codecov.io/gh/flux-framework/flux-sched/pull/1206?src=pr&el=tree&filepath=resource%2Fschema%2Fresource_graph.hpp&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flux-framework#diff-cmVzb3VyY2Uvc2NoZW1hL3Jlc291cmNlX2dyYXBoLmhwcA==) | `57.1% <100.0%> (-4.2%)` | :arrow_down: | ... and [1 file with indirect coverage changes](https://app.codecov.io/gh/flux-framework/flux-sched/pull/1206/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flux-framework)