flux-framework / rfc

Flux RFC project
https://flux-framework.readthedocs.io/projects/flux-rfc/
7 stars 13 forks source link

Define how Flux programs obtain assigned resource set 'R' from enclosing instance #110

Open grondo opened 6 years ago

grondo commented 6 years ago

(Opening this issue at @dongahn's request. I'm not actually sure if this belongs in the RFC repo or with flux-core. Opening here in case the result of this discussion is either a new RFC or an extension to rfc16)

One of the main use cases for R from #109 is that R will be the low-level format for configuration input to an instance. Additionally, the job shell will need to use R along with the jobspec to determine which tasks should be invoked locally and on what resources.

Therefore it is important that we have a standard, documented method or plan of how exactly programs will fetch R from the enclosing instance. Since R is in the main kvs namespace, multi-user programs will not be able to fetch R directly, so my first proposal would be that a job.fetch_resources or similar service be added to the job management module. I'm open to other ideas though, and realize this approach could be a bit naive at this point.

Another thing to think about is how resources are configured for instances that do not happen to be nested under another Flux instance. For example, flux launched as a job under a different resource manager might generate R from its distributed hwloc data. The system version of Flux will generate R from a configuration file, etc. It would be nice if the "generation" of R was abstracted to minimize code duplication and increase flexibility.

dongahn commented 6 years ago

Thanks @grondo for starting this discussion thread. These are all relevant discussion points. As a precursor, though, one thing I think would be good to be specified in a RFC (e.g., RFC 3?) is common mechanisms and/or abstractions for a nested instance to be able to use a service like this, provided by the enclosing instance. We have been using this for hierarchical launching, but I'm not sure if we ever formalize this.

grondo commented 6 years ago

Thanks, good point @donghan. I seem to remember another discussion thread on this, perhaps in an issue in flux-core? I think the general idea is that any program can connect to its parent via a Flux handle using FlUX_URI. An instance as a program can do the same thing, but the complexity here is that the instance may be thus required to manage multiple Flux handles.

I think some of this discussion is in flux-framework/flux-core#1151.

If you want we could open a separate issue to document how sub-instance uses services from a parent.

dongahn commented 6 years ago

so my first proposal would be that a job.fetch_resources or similar service be added to the job management module.

As far as the parent-child communication protocol is either documented or formalized, job.fetch_resources seems sufficient. As @grondo said in a hallway meeting, this would not be enough if/when we go with dynamic scheduling whereby we need two-way communications, this will be far beyond the S4 horizon. But @SteVwonder should chime in as he wants to plan his hierarchical module development along with our S4 plan.

garlick commented 6 years ago

Since R is in the main kvs namespace, multi-user programs will not be able to fetch R directly, so my first proposal would be that a job.fetch_resources or similar service be added to the job management module.

(Redirected from #112) Alternate idea (not sure if it has merit): link a copy of R (or a broader snapshot) from the main KVS namespace for the job to the guest namespace?

grondo commented 6 years ago

(Redirected from #112) Alternate idea (not sure if it has merit): link a copy of R (or a broader snapshot) from the main KVS namespace for the job to the guest namespace?

Yeah, that is a good proposal. Perhaps any data from the main KVS namespace that might be needed by any program could be linked in to the guest namespace? This is more flexible than creating a new RPC per bit of required data.

dongahn commented 6 years ago

This is more flexible than creating a new RPC per bit of required data.

I agree. What would be other data that need to be linked, though?

A bit tangent: one time we talked about a space where a child stores some of its own (provenance?) data which then get reaped by the parent at the end of the run. Does that sort of data also belong to this operation although the direction will be reverse?

A specific use case I have in mind is that, UQ guys may want to register a filter into stdout/stderr, which can harvest small but specific markers. They may want to put only these markers into KVS which can be stored along with a job record. This way, the marker info becomes available as part of job query and depending on what they see there, they may decide further actions: e.g., resubmit and re-execute.

garlick commented 6 years ago

A bit tangent: one time we talked about a space where a child stores some of its own (provenance?) data which then get reaped by the parent at the end of the run. Does that sort of data also belong to this operation although the direction will be reverse?

That's kind of what I was proposing here (in #112):

In fact maybe we should define a simple mechanism to a) during child bootstrap, load any initial contents of guest namespace into child KVS, and b) store final contents of guest instance KVS to guest namespace upon instance shutdown. (Probably filtered or with configurable levels of persistence of course)

grondo commented 6 years ago

In fact maybe we should define a simple mechanism to a) during child bootstrap, load any initial contents of guest namespace into child KVS, and b) store final contents of guest instance KVS to guest namespace upon instance shutdown. (Probably filtered or with configurable levels of persistence of course)

That is a great idea. There are two levels here: 1. persistence of guest kvs namespace for all programs, and 2. (configurable) propagation of data from child instance kvs to parent's guest kvs namespace (taking advantage of its persistence).

It seems like the child instance has access to both its own kvs and the parent instance's guest namespace, so it makes the most sense to push data to the guest namespace as needed?

dongahn commented 6 years ago

That is a great idea.

I agree.

It seems like the child instance has access to both its own kvs and the parent instance's guest namespace, so it makes the most sense to push data to the guest namespace as needed?

push vs. pull models were something that we talked whole lot about when we were pursuing dynamic scheduling. Any reason, not to support both push and pull?

dongahn commented 6 years ago

Again a tangent: what is our stated strategy for stdout/stderr for S4? Is it still KVS?

grondo commented 6 years ago

Again a tangent: what is our stated strategy for stdout/stderr for S4? Is it still KVS?

So far I think yes. This is the most efficient way to get to the capabilities required for S4. It can always be improved after with different job shells or job shell options if there is a perceived need.

push vs. pull models were something that we talked whole lot about when we were pursuing dynamic scheduling. Any reason, not to support both push and pull?

I think both methods can be pursued eventually. However, child instance already has access to both its own KVS and its guest kvs namespace in the parent, so this seems naturally the first avenue to tackle. For the parent to pull data from the child, it would have to connect to the child instance's KVS api, and in multi-user case it won't have the authority to do so, so some extra code would have to be developed to support that... do we have time for that in the near term?

dongahn commented 6 years ago

So far I think yes. This is the most efficient way to get to the capabilities required for S4. It can always be improved after with different job shells or job shell options if there is a perceived need.

Good to know. Related, if we want to filter stdout/stderr to harvest interesting markers to track, do we expect that we need to filter the outputs directly at the KVS level or will there be a hook with which we can register a filter directly into the stdout/stderr stream handler such a way that we can detect those on the fly?

For the parent to pull data from the child, it would have to connect to the child instance's KVS api, and in multi-user case it won't have the authority to do so, so some extra code would have to be developed to support that... do we have time for that in the near term?

Yeah I kind of thought "security" would be why you suggested push. Thank you for the clarification. It makes sense to me, and in terms of near-term use cases, push is well aligned with what I want to pursue.

grondo commented 6 years ago

Good to know. Related, if we want to filter stdout/stderr to harvest interesting markers to track, do we expect that we need to filter the outputs directly at the KVS level or will there be a hook with which we can register a filter directly into the stdout/stderr stream handler such a way that we can detect those on the fly?

Do you need to filter or watch stdout/err? Watching KVS stdio streams would be easy and could be done via a script launched along with the job, or perhaps a job shell plugin(?) (e.g. this makes development of a Flux "io-watchdog" almost trivial). Filtering could also be done via job shell plugins perhaps, but would possibly be more involved.

We could design for both.

garlick commented 6 years ago

It seems like the child instance has access to both its own kvs and the parent instance's guest namespace, so it makes the most sense to push data to the guest namespace as needed?

I'm feeling like I didn't think through my original comment with regard to preserving KVS state of child in guest namespace. We have an issue open on preserving the entire KVS state in a file, that becomes like a checkpoint for the instance. I wouldn't want to propose that we store application checkpoints in the guest namespace :-)

The original topic of this issue pertains to transmitting R to the child instance. Should we say that we'll do that via the guest namespace and add something to that effect to RFC 16? It doesn't even need to be specific for a flux launch - just put it there and if the application happens to be flux, it knows where to find it.

dongahn commented 6 years ago

(e.g. this makes development of a Flux "io-watchdog" almost trivial).

This will be an awesome feature! I have some killer use cases in the dev tool area.

Do you need to filter or watch stdout/err?

I think watch is a subset of the capabilities we will ultimately need. UQ guys have a workflow such a way that they look through the output and if certain outputs are found (e.g., "error: soft error" --> resubmit; "error: hard failure: node not responding" --> NOOP). We can always go back to the old way and do "awk" on the stdout/stderr at the kvs level; but I was thinking if there could be a more scalable way to detect these marker on the fly by allowing users to register some filters, this can be a new capability.

The exact requirement will only be known when @SteVwonder finishes his current research investigation fully. So, treat this use case with a grain of salt for now.

grondo commented 6 years ago

The original topic of this issue pertains to transmitting R to the child instance. Should we say that we'll do that via the guest namespace and add something to that effect to RFC 16?

Yeah, I think we should add something about to RFC 16, it just seems like the right approach.

The only thing that bothers me a little is that the data has to be copied -- or will it be more like a copy-on-write snapshot? We can discuss more in an RFC 16 PR.

It doesn't even need to be specific for a flux launch - just put it there and if the application happens to be flux, it knows where to find it.

Even more important than flux launch -- the job shell may need to read R so that will be the first user of it I think...

garlick commented 6 years ago

The only thing that bothers me a little is that the data has to be copied -- or will it be more like a copy-on-write snapshot? We can discuss more in an RFC 16 PR.

Were you going to submit that or should I? The copy to the guest namespace would just be adding a metadata reference to the same content, so like a COW snapshot. Same cost for R or the entire directory...

dongahn commented 6 years ago

The copy to the guest namespace would just be adding a metadata reference to the same content, so like a COW snapshot. Same cost for R or the entire directory...

Like it.

grondo commented 6 years ago

Were you going to submit that or should I? The copy to the guest namespace would just be adding a metadata reference to the same content, so like a COW snapshot. Same cost for R or the entire directory...

Cool.

I'll propose something if you haven't started already. You can correct me if I get the language wrong.

garlick commented 6 years ago

Go ahead đź‘Ť

grondo commented 6 years ago

114 opened.