does slurm have a layout file or similar that we can use to assign nodes to racks, rather than infer from the naming scheme?

anderbubble commented 9 years ago

There isn't one. I'm inclined to ignore the issue by just removing alltoall-rack, since the rack test is basically meaningless anyway.

pruprecht commented 9 years ago

Actually, the all-to-all rack test is the only one that tests the core switches, so it is pretty important.

Slurm can be topology-aware, and this may be useful when figuring which nodes are on which TOR IB switches.

Pete

anderbubble commented 9 years ago

I neglected to mention that removing the rack test would assume that we've first added an explicit multi-tor alltoall test. This tests the core switches more consistently, and doesn't depend on inferring node layout and network topology from the node name.

As for topo-aware slurm, I've already created a slurm topology.conf file, and I'm using that to schedule tests now (in stead of the json that was in curc-bench before); we can hand that file to slurm to make it topo-aware; but there has been concern that it might carry a performance hit with no practical benefit in our presumed full-bisection-bandwidth network. If/when we oversubscribe the network for future clusters, it likely would become more specifically desirable.

anderbubble commented 9 years ago

There's no slurm facility to track node layout without abusing features or similar. We should leave it in place for now, (but see #72) because there are environmental reasons to want to test a rack at a time (rack 01, for example, is not cooled as well as the others); but we should also make a test that more explicitly tests the core switches. (#71)

ResearchComputing / curc-bench

does slurm have a layout file or similar that we can use to assign nodes to racks, rather than infer from the naming scheme? #45