christopherwharrop / rocoto

Rocoto Workflow Management System
Apache License 2.0
21 stars 16 forks source link

Need capability to launch tasks inside of other running batch jobs that serve as resource pools #22

Closed samtrahan closed 6 years ago

samtrahan commented 6 years ago

We request a feature that enables Rocoto to submit jobs (usually large ones) that, once started, can serve as a resource pool for scheduling a workflow's tasks. Rocoto will then "submit" tasks to those resource pools instead of the batch system. The "jobs" that run the tasks will not be "jobs" in batch system sense, they will (most likely) be processes started and monitored by Rocoto within the large resource pool jobs.

The request for this capability is motivated by scheduling difficulties that can arise on oversubscribed systems where queue wait times are very large and workflows having large numbers of tasks cannot make timely progress. The obvious workaround, to consolidate small tasks into larger tasks, will not work because it eliminates the ability to restart failed tasks on a fine grained level thus increasing beyond acceptable limits the cost of reruns of tasks that fail.

samtrahan commented 6 years ago

One of our team is putting together a proof of concept of a primitive version of this. I've merged the feature/nobatchsystem and feature/principle-of-least-surprise into one branch feature/gaea-test-mk3. He is going to manually submit Moab batch jobs, each of which runs only part of the workflow via rocotorun -c YYYYMMDDHHMM in a NOBatchSystem (scheduler="no") workflow.

This proof of concept will let us know how well the approach works. I'm hoping to get back to Rocoto development later this week. Once I do, I'll create the automated version of this, which will combine MOABBatchSystem and NOBatchSystem into one umbrella BatchSystem class.

christopherwharrop commented 6 years ago

I would like Rocoto to create and manage the resource pool jobs (submission, tracking, etc) automatically such that users don't need to know anything about them. The users should choose 1) one of the supported schedulers (it needs to work with all of them) and 2) whether or not to use resource pools for scheduling tasks. Rocoto should do the rest on its own.

This can get very complicated, which is one reason it hasn't been implemented. Rocoto must figure out how large the resource pool jobs need to be, and how much walltime they need, by analyzing the workflow. It also needs to manage failures of the resource pool jobs as well as the corresponding failures of the jobs associated with workflow tasks. Additionally, the user will need to be given some kind of ID for each job running inside the resource pool. That ID should include a reference to the batch system job ID of the resource pool job so that users have a way to report issues to systems staff.

samtrahan commented 6 years ago

Chris,

I disagree. I don't think a system that automatically decides such resources can be written, nor would it be the most useful approach from an end-user standpoint. A user knows what resources they need, otherwise they would not know how to provide <nodes> tags in the first place. You need to give them control over the way in which the "resource pools" are specified and used. The combination of a selective rocotorun and a multi-scheduler class is a trivial way to implement this, and provides all the power of a resource pool implementation.

Plus, by making this a multi-scheduler feature, rather than restricting it to resource pools, you give the user power to use it for more purposes, such as submitting jobs in two different batch systems within the same cluster. All of it is handled automatically within Rocoto, just by checking which batch system is requested.

christopherwharrop commented 6 years ago

You are conflating a number of things and that is making this discussion very difficult. You are also blending in implementation details with the description of the issues, which makes it even more difficult. I need to ask you to please stop doing both of those things so that we can have a productive conversation about bugs you are reporting and features you are requesting.

A new feature to enable running multiple tasks inside of larger batch jobs has nothing to do with a feature than enables running tasks outside of a batch system. Those are completely different features that stand (or not) alone on their own merits. If the implementations of those two features have something in common that is fine, and it should be considered, but that is an implementation detail. We haven't even hashed out an agreed upon specification for these requested features yet.

I support the idea of Rocoto using batch jobs, managed automatically by Rocoto, as pools of resources in which to schedule and manage workflow tasks. I am going to reject the request to allow users to specify their own pools for now, however. The reason is that it exposes the user to (and makes him/her responsible for) unnecessary complexity with no compelling benefit. It introduces even more ways in which a user can screw something up. There is enough information in the task specifications for Rocoto to construct the pools automatically, and asking the user to reason about resource pools for their workflow is inappropriate. I want to confine the users' responsibilities to defining the structure of their workflow and specifying resources for individual tasks.

Another possible solution to this problem is to not use resource pools, but to have Rocoto do automatic task clustering. This is a common technique found in other workflow management systems where smaller tasks are clustered together and submitted as a single job. This is done automatically without user knowledge or input, and can significantly increase the throughput of a workflow by reducing queue wait times since fewer batch jobs are required. i support adding a task clustering feature as well, so long as it is done automatically.

In reference to your "multi-scheduler" comment, that would not presently be an asset as there is no place to use it.

samtrahan commented 6 years ago

I am conflating too many different issues in this Issue. I'm going to close this and open a new one for each issue, starting with your proposal of task clustering. I have thought about that extensively, and I think it is the best way to go. I have thought of a viable solution that can be implemented in a reasonable amount of time.