Closed anderbubble closed 9 years ago
Actually, the all-to-all rack test is the only one that tests the core switches, so it is pretty important.
Slurm can be topology-aware, and this may be useful when figuring which nodes are on which TOR IB switches.
Pete
I neglected to mention that removing the rack test would assume that we've first added an explicit multi-tor alltoall test. This tests the core switches more consistently, and doesn't depend on inferring node layout and network topology from the node name.
As for topo-aware slurm, I've already created a slurm topology.conf file, and I'm using that to schedule tests now (in stead of the json that was in curc-bench before); we can hand that file to slurm to make it topo-aware; but there has been concern that it might carry a performance hit with no practical benefit in our presumed full-bisection-bandwidth network. If/when we oversubscribe the network for future clusters, it likely would become more specifically desirable.
There's no slurm facility to track node layout without abusing features or similar. We should leave it in place for now, (but see #72) because there are environmental reasons to want to test a rack at a time (rack 01, for example, is not cooled as well as the others); but we should also make a test that more explicitly tests the core switches. (#71)
There isn't one. I'm inclined to ignore the issue by just removing alltoall-rack, since the rack test is basically meaningless anyway.