flux-framework / flux-sched

Fluxion Graph-based Scheduler
GNU Lesser General Public License v3.0
86 stars 40 forks source link

Subgraph detach capability #554

Open milroy opened 4 years ago

milroy commented 4 years ago

For elastic scaling of jobs and the dynamic removal of resources for cloud integration, flux-sched needs the capability to detach resources. This functionality can be paired with issue #552 to first deallocate resources from a job and then detach them from the resource graph. The proposed capability will take advantage of and extend the JGF reader currently in place.

To add subgraph detach, several modifications must be made. First, resource/utilities/commmand.cpp needs cmd_detach, detach, and detach_run. The cmd_detach will have two required arguments and one optional argument. The two required arguments will be jgf_file and the bool remove. The optional argument will be jobid.

Note that detach will need to support cases where subgraphs contain resources spanning multiple allocated jobs and available resources. This functionality is beyond the scope of this issue.

Deallocation will be handled in issue #552; actual subgraph removal will be handled with a new method akin to dfu_impl_t::update: https://github.com/flux-framework/flux-sched/blob/f77983698a62aee4fc9dd0bc7ac2e392492d154d/resource/traversers/dfu_impl_update.cpp#L490

except that the new method will need to call a new JGF reader method that removes the subgraph by traversing the JGF file.

milroy commented 4 years ago

@dongahn: I think my proposal is similar to our discussion at the end of December. Do you see any immediate pitfalls?

dongahn commented 4 years ago

Sounds like a good start.

The two required arguments will be jgf_file and the bool remove. The optional argument will be jobid.

What is the semantics of the second and the third optional argument you propose here?

Note that detach will need to support cases where subgraphs contain resources spanning multiple allocated jobs and available resources. This functionality is beyond the scope of this issue.

What would be the main problem supporting a subgraph spanning multiple allocated jobs? Is the main problem, not supporting the partial deallocation?

dongahn commented 4 years ago

Just so that I understand the semantics of your graph detach proposal. Is the proposal to remove the target subgraph from the resource graph completely? Or is it more like "shadowing the subgraph" such that it is effectively detached but the vertices and edges are still there in the resource graph?

milroy commented 4 years ago

The two required arguments will be jgf_file and the bool remove. The optional argument will be jobid.

What is the semantics of the second and the third optional argument you propose here?

I am in the process of reworking the semantics; I'm leaning toward dropping the jobid argument.

Is the proposal to remove the target subgraph from the resource graph completely? Or is it more like "shadowing the subgraph" such that it is effectively detached but the vertices and edges are still there in the resource graph?

This will depend on the scenario. In the case where a child is commanded to give back a subgraph of its resource graph the child will need to remove the target subgraph completely. The parent will only need to deallocate the resources via an overloaded dfu_impl_t::remove or similar.

In the case where a scheduler is giving up resources (to a parent, or to an external RJMS) it will need to deallocate the resources and remove them completely.

milroy commented 4 years ago

As a follow up to my conversation with @dongahn this afternoon, I'd like to record the pros and cons of the "shadow" vs "remove" detach approaches.

For the sake of clarity, the shadow approach would annotate vertices and edges with "ghost" or a similar descriptor that makes them ineligible for traversal. This introduces additional complexity in traversals (since the vertices and edges must be checked for the ghost descriptor) and accumulators (like the pruning filter). However, it does bypass the time complexity of removing edges from the resource_graph. The time savings depend on the container used for the VertexList and OutEdgeList, as described in Boost's Using adjacency_list. Note that issue #534 is related to this discussion. This approach also avoids the time needed to add vertices and edges back if the resources become available again.

Aside from removal (and implicit re-addition) time savings, one compelling reason to use the shadow approach is if the ghost resources are known to become available by a specific time in the future. If that is the case, the resources can be included in reservation decisions. I'm not sure how useful this is in practice, though.

Detach via removal (i.e. deletion of vertices and edges) has the disadvantage of incurring the varying time complexity discussed above. The advantages are that the resource_graph doesn't occupy more memory than necessary, and that the implementation appears to be more straightforward (especially for traversals).

Note that the two topologies are effectively equivalent for allocation/deallocation (but not necessarily for reservation, as described above) as ghost vertices and edges are not traversable.