Improve match performance with node constrains when feasibility is enabled

milroy commented 1 month ago

Matching jobspecs with node constraints is currently implemented in such a way that the recursive dom_* call occurs before constraint checking.

This PR moves the node constraint check within the prune function to prevent unnecessary recursive dom_* calls, speeding up feasibility checks.

The following are comparative performance tests run on a laptop. The following test configuration is used to test without feasibility checking:

flux config load config.toml
flux module load resource noverify monitor-force-up
flux module load sched-fluxion-resource
flux module load sched-fluxion-qmanager

and the following is is contents of config.toml:

[resource]
noverify = true
norestrict = true

[[resource.config]]
hosts = "test[1-16384]"
cores = "0-63"
gpus = "0-8"

[sched-fluxion-qmanager]
# easy backfill
queue-policy = "easy"

[sched-fluxion-resource]
match-policy = "firstnodex"
match-format = "rv1_nosched"
prune-filters = "ALL:core,ALL:gpu,cluster:node,rack:node"

Test results without the change in this PR:

Feasibility enabled:

time for i in {1..100}; do flux submit -N 16 -n 64 --requires="hosts:test[16001-16016]" hostname; done
real    0m27.414s

Feasibility disabled:

time for i in {1..100}; do flux submit -N 16 -n 64 --requires="hosts:test[16001-16016]" hostname; done
real    0m7.360s

Feasibility enabled:

time for i in {1..100}; do flux submit -N 16 -n 64 hostname; done
real    0m8.184s

Feasibility disabled:

time for i in {1..100}; do flux submit -N 16 -n 64 hostname; done
real    0m7.136s

Test results WITH the change in this PR:

Feasibility enabled:

time for i in {1..100}; do flux submit -N 16 -n 64 --requires="hosts:test[16001-16016]" hostname; done
real    0m10.710s

Feasibility disabled:

time for i in {1..100}; do flux submit -N 16 -n 64 --requires="hosts:test[16001-16016]" hostname; done
real    0m7.344s

Feasibility enabled:

time for i in {1..100}; do flux submit -N 16 -n 64 hostname; done
real    0m8.155s

Feasibility disabled:

time for i in {1..100}; do flux submit -N 16 -n 64 hostname; done
real    0m7.231s

grondo commented 1 month ago

Nice!

Just a few quick comments about testing job submission and scheduling:

time for i in {1..100}; do flux submit -N 16 -n 64 hostname; done
real  0m8.184s

Submitting jobs with flux submit back-to-back like this may not be fully exercising the system. This is because the time to submit a single job here is bounded by the time to run flux submit, not necessarily the actual submission RPC. A better test of the impact of these changes might be to run a throughput test that tests how long it takes the scheduler to schedule all the submitted jobs. To do that, use the --cc option. For example on a size=1 instance on quartz:

$ time for i in {1..100}; do flux submit -N 16 -n 64 hostname; done
real    0m34.698s
$ time flux submit --cc=1-100 -N16 -n 64 hostname
real    0m0.847s

You can see that submitting multiple jobs at once with --cc is a few orders of magnitude faster than flux submit in a loop. This is not only because we don't have to pay the half second penalty for starting python for each job, but also because the submission RPCs are sent asynchronously.

If you add --wait-event=alloc then the command will wait until all jobs have been scheduled, you can get the time for the scheduler to schedule all jobs. You could also add --setattr=exec.test.run_duration=10s to bypass the execution system so that jobs actually stay in RUN state instead of getting an exception since all the fake resources don't really exist.

Using these together might give us a better idea of the full impact of these changes, i.e. I imagine things will look much better...

milroy commented 1 month ago

Just a few quick comments about testing job submission and scheduling:

@grondo I now realize you've explained that to me before but I didn't think of it when I was running the initial test. Thanks for the additional details so I understand better!

I reran the tests above with the same setup using your suggestions, and the improvements from the change in this PR are uniform and better than I reported above. Note that I ran 1000 jobs with feasibility disabled because the tests ran so fast:

Test results WITHOUT the change in this PR:

Feasibility enabled (100 jobs):

time flux submit --cc=1-100 -N16 -n 64 --requires="hosts:test[16001-16016]" hostname
real    0m33.823s

Feasibility disabled (1000 jobs):

time flux submit --cc=1-1000 -N16 -n 64 --requires="hosts:test[16001-16016]" hostname
real    0m1.382s

Feasibility enabled (100 jobs):

time flux submit --cc=1-100 -N16 -n 64 hostname
real    0m1.896s

Feasibility disabled (1000 jobs):
```
time flux submit --cc=1-1000 -N16 -n 64 hostname
real    0m0.395s
```
Test results WITH the change in this PR:

Feasibility enabled (100 jobs):

time flux submit --cc=1-100 -N16 -n 64 --requires="hosts:test[16001-16016]" hostname
real    0m8.807s

Feasibility disabled (1000 jobs):

time flux submit --cc=1-1000 -N16 -n 64 --requires="hosts:test[16001-16016]" hostname
real    0m1.372s

Feasibility enabled (100 jobs):

time flux submit --cc=1-100 -N16 -n 64 hostname
real    0m1.873s

Feasibility disabled (1000 jobs):

time flux submit --cc=1-1000 -N16 -n 64 hostname
real    0m0.379s

That's a throughput improvement of almost 4x for 100 jobs with feasibility checking.

milroy commented 1 month ago

Thanks for the reviews! Setting MWP.

codecov[bot] commented 1 month ago

Codecov Report

Merging #1162 (b3fbafe) into master (6e6576d) will increase coverage by 0.0%. The diff coverage is 100.0%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #1162 +/- ## ====================================== Coverage 73.9% 73.9% ====================================== Files 103 103 Lines 14395 14396 +1 ====================================== + Hits 10643 10644 +1 Misses 3752 3752 ``` | [Files](https://app.codecov.io/gh/flux-framework/flux-sched/pull/1162?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flux-framework) | Coverage Δ | | |---|---|---| | [resource/traversers/dfu\_impl.cpp](https://app.codecov.io/gh/flux-framework/flux-sched/pull/1162?src=pr&el=tree&filepath=resource%2Ftraversers%2Fdfu_impl.cpp&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flux-framework#diff-cmVzb3VyY2UvdHJhdmVyc2Vycy9kZnVfaW1wbC5jcHA=) | `83.0% <100.0%> (+<0.1%)` | :arrow_up: |

flux-framework / flux-sched