camsas / firmament

The Firmament cluster scheduling platform
Apache License 2.0
415 stars 79 forks source link

Workaround required for cs2 to work with sparse node IDs #26

Open ms705 opened 9 years ago

ms705 commented 9 years ago

When using the cs2 solver, the maximum node ID in the flow graph output sent to the solver can be no greater than the number of nodes in the flow graph, or the solver fails. In other words, the p N E DIMACS line at the start of the output must contain the maximum node ID used as N.

The workaround fix deployed in ee31f8091097f06828db89e68027541d7f6acf55 is to return the maximum ID in use instead of the true number of nodes, which leads to cs2 implicitly assuming disconnected nodes for the unused IDs. However, this isn't ideal: as part of the workaround, FlowGraph::NumNodes() returns a number greater than the actual number of nodes in the graph. Moreover, the implied disconnected nodes may slow the solver down (although I doubt there is much impact in practice).

Note that this isn't a problem when node IDs are reused: the number of nodes in the flow graph and the maximum ID are in agreement in this case.

Possible comprehensive solutions:

  1. Re-map sparse node IDs into a continuous space before sending them to cs2, and back again (yuck!).
  2. Continue with the disconnected implicit node approach, but re-engineer things so that FlowGraph::NumNodes() returns the correct number again.

See patch in CL 237749.