SCALE-MS / scale-ms

SCALE-MS design and development
GNU Lesser General Public License v2.1
4 stars 4 forks source link

Understand bootstrapping scenarios #43

Closed eirrgang closed 4 years ago

eirrgang commented 4 years ago

RP has substantial infrastructure to bootstrap the environment of the agent, compute units, and (now) task_overlay workers, including installing a Python venv in the execution environment.

hooks and feedback

Various components (agent, tasks, ...) have configuration hooks that allow arbitrary shell scripting to be injected into Bourne shell and Python bootstrapping scripts. (See, for instance, https://github.com/radical-cybertools/radical.pilot/blob/devel/docs/architecture/bootstrapping.md#pilot-bootstrapping) Generally, the intended behavior is an idempotent environment initialization.

What feedback mechanisms are available / possible to provide information back towards the client (or forwards towards the task) with respect to the initial state, progress, and final state during the bootstrapping procedure, or to return information from the filesystem, environment, or bootstrap script command results?

venv bootstrapping

Is the venv generated programmatically by a simple call to a Radical script pre-installed in the execution environment? Is a frozen venv packaged by the client and sent to a remote shell environment to unpack? Something in between? Are there client-side hooks to influence its contents?

Task overlay environment

If this venv is the active environment of the python interpreter that runs the master and worker python scripts, we need to make application support code available in that environment. What is the current (or proposed) mechanism for this?

General provisioning: file placement

Task descriptions can specify data that must be staged for the task. Staging is fairly flexible, such as versatile data sources and transfer methods, and (optimizable) awareness of declared shared data.

A Pilot description identifies resources that must be compatible with a task description in order for the Pilot to receive the task for handling.

Can this include filesystem details, such as installed software or available data sets, or something with which an agent environment could be provisioned (presumably resulting in resources being added to the active pilot description)?

Is more hierarchy or scoping available for data staging directives? I.e. can the pilot description / agent / unit manager specify data staging requirements? Is an RP client able to use the SAGA API for additional resource management, or does RP necessarily isolate its clients from lower level interfaces?

Resource re-use

In failure recovery modes, to what degree can RP attempt to reexecute tasks in the same worker, unit, agent, or pilot?

Can any filesystem artifacts of a Session be used to more quickly bootstrap or recover a previous session, such as when the client interpreter is interrupted or a Session fails after completing some tasks? Is there a clean way to detach from and reattach to a Session between Python interpreter invocations?

peterkasson commented 4 years ago

Sounds interesting. If I understand correctly, 3 questions here: 1) how does this work 2) if we're using this to stage dependencies, how do we work out staged dependencies vs. system dependencies? Maybe the answer is stage everything at the beginning and have worker logic to decide whether to use the staged or any detected system capabilities. This would be a bit of a performance hit in that there's extra staging work, but it could simplify life. The obvious issue is that then we don't want really short-lived pilot environments. 3) connected to this, can we either checkpoint and reuse environments from incomplete commands or cache the staged dependencies above.

Does this capture the issues?

andre-merzky commented 4 years ago

The issue covers a lot of ground from an RCT perspective - I'll chop this initial reply in a couple of pieces.

A general remark: the bootstrapping procedures served us, to a large part, well, as it (a) reduces the involvment of the end user to prepare the target resource to be used with RCT, and (b) povides us with additional hooks and knobs to adjust in our own experiments. Having said that, I must admit that our bootstrapping process is fairly complex, involved and opaque, and adds a certain management effort for our group. Personally, the main advantage is for me that the system can reflect the exact RCT software stack on the target resource as is used on the client resource - that is invaluable while developing and testing. Note that the bootstrapper can be configured to use a statically, user-installed virtualenv and RCT deployment, which makes it morer stable, but removes the above advantages.

eirrgang commented 4 years ago

Does this capture the issues?

My questions are a little more comprehensive than that. My areas of inquisition are

Our project is introducing new kinds of workflow data dependencies, new kinds of system / execution environment dependencies, and new kinds of workflow metadata. These may or may not be provided through existing facilities, and also may or may not be natural extensions of obvious existing roles.

I thought that instead of opening a dozen issues to try to design each of these aspects of the new framework, I would try to see if I could get a clearer picture by publicly picking the brains of @mturilli and @andre-merzky regarding the launch sequence of current RP sessions. I don't know if the issue tracking system is the best forum for active discussion, but it should provide an effective place to record the progress of conversations on Slack, teleconferences, etcetera, until notable details are recorded in the wiki or other documents.

andre-merzky commented 4 years ago

hooks and feedback

this is a weak point: the bootstrapper results are only available in log files on the target resource. If the bootstrapper fails, we do not yet have established a fast and reliable channel back to the client machine (the MongoDB connection is established at a later state). We are working on improving this, but right now, the pilot entering FAILED state is the only client side indication that something went wrong.

andre-merzky commented 4 years ago

venv bootstrapping

It's a mix of all: the bootstrapper is faily configurable, and depending on the settings in the resource config can either use a pre-existing venv including an installed RCT stack; an pre-existing venv on which it adds a private RCT installation (in a different location); create a fresh venv on the fly; purge an old venv and rercreate it; refresh an exsiting venv; etc. The RCT stack is either pre-installed, installed from pypi, from a git branch; or staged to the target machine (to mirror the client side).

The default mode is: create a venv if it doesn't exist, otherwise use it, and install the same RCT stack as used on the client side in a private space. Note that venv's are always created fresh for a new RP release (to simplify management of dependencies and to allow to mix RCT versions running on the same target resource).

andre-merzky commented 4 years ago

Task overlay environment

The RP pilot agent is not necessarily the same venv as used for the master/worker - that's just the default setup. In our covid work, we for example use a venv for the pilot, and a conda env for the master and worker. The only condition here is that the master and worker venv have a suitable version of RP installed, so that the master and worker base classes can be used, and we don't have any protocol conflicts when communicating with the agent. The master / worker venv is currently not created by bootstrapping (unless they reuse the pilot agent venv).

andre-merzky commented 4 years ago

General provisioning: file placement

Can this include filesystem details, such as installed software or available data sets, or something with which an agent environment could be provisioned (presumably resulting in resources being added to the active pilot description)?

Is more hierarchy or scoping available for data staging directives? I.e. can the pilot description / agent / unit manager specify data staging requirements? Is an RP client able to use the SAGA API for additional resource management, or does RP necessarily isolate its clients from lower level interfaces?

I am not sure I am parsing these questions correctly - do you happen to have an illustrative use case for these cases?

Tasks (and the master and workers are just tasks in this context, albeit special ones) can define a set of pre_exec commands, which can do, say, a module load to provide some specific software, or an conda activate etc (although there are scalability limitations with that approach). The environment of the pilot agent can only be changed within limits, and not from the API. The workload is supposed to be isolated from the pilot env, so there should be little need to change the pilot env.

hierarchy: yes, but I am not sure I am thinking of the same as you are. The pilot can define staging directives which will end up in the pilot sandbox, and the tasks can then link data from the pilot sandbox into the task sandboxes, allowing data reuse. We also provide directives to stage to a session sandbox (to share data between pilots in the same session), and to a resource sandbox (to share data between different sessions on the same resource).

Is an RP client able to use the SAGA API for additional resource management, or does RP necessarily isolate its clients from lower level interfaces?

The client is free to create SAGA filesystem handles out of band, and to copy/move data around - the RP API willingly provides the endpoint URLs for the various sandboxes. The API does not expose internally used SAGA handles though (although we considered soing so at some point).

andre-merzky commented 4 years ago

Resource re-use

In failure recovery modes, to what degree can RP attempt to reexecute tasks in the same worker, unit, agent, or pilot?

RP itself does not attempt to recover or rerun failed tasks - it's state model would not allow that right now. EnTK adds that capability above RP though, so we know it's possible to implement. Data from the previous sandbox can be reused (as long as you are willing to handle partial results etc, obviously).

Can any filesystem artifacts of a Session be used to more quickly bootstrap or recover a previous session, such as when the client interpreter is interrupted or a Session fails after completing some tasks?

Yes, the session location is exposed and can be referenced for file staging directives (which could include linking of a complete sandbox or storage space, for example).

Is there a clean way to detach from and reattach to a Session between Python interpreter invocations?

Not yet - planned, but it will take a long time before that becomes available.

eirrgang commented 4 years ago

I am not sure I am parsing these questions correctly

You parsed correctly. Thank you!

eirrgang commented 4 years ago

This discussion appears to be finished for now. Further distillation to the wiki can take place under #41 without requiring this issue to remain open.