Open dongahn opened 7 years ago
Reference prior discussion captured in Issue 1151
Thanks @lipari!
An instance needs to pick up (at least) R from its enclosing instance. RFC 16 (kvs job schema) sets up a guest KVS namespace for each job. Placing data in the guest namespace is one (scalable) way to transmit data to the child.
In fact maybe we should define a simple mechanism to a) during child bootstrap, load any initial contents of guest namespace into child KVS, and b) store final contents of guest instance KVS to guest namespace upon instance shutdown. (Probably filtered or with configurable levels of persistence of course)
This could be something the job shell orchestrates for the child, or something the instance does on its own by directly communicating with the parent using the flux API.
I realize your'e getting at how ensembles of instances co-schedule a workload but we should definitely get the basics down first IMHO.
I realize your'e getting at how ensembles of instances co-schedule a workload but we should definitely get the basics down first IMHO.
That discussion is being tracked #110. It seems you are proposing something different than @grondo's: allowing a sub-instance to fetch its R using job.fetch_resources
. Perhaps, we should move this discussion to #110?
This ticket is intended to track our discussion as to what's the best way to document for a child instance to use the parent's services which may be beyond an ability to obtain its R.
If you want, we can tackle the basics through #110 first and circle back to this general capability discussions.
Oops sorry! Well, for this discussion, I did want to make the point that communication between instances may want to make the most of the general mechanisms that are proposed, especially given that direct communication will be limited if child is not the same user as parent. Didn't mean to get off topic.
A sub-instance is just a program, and we already know how a program running in an instances uses the instances services (connect to FLUX_URI
). Therefore, I'm struggling with what is meant to be discussed in this issue. Is the question more generally about how multi-user programs access services of the parent?
@grondo: I guess my main point is documentation. If developers like @SteVwonder who want to design a hierarchical capability that requires parent-child communications, I was wondering if pointing them to a closed Issue 1151 is sufficient. If not, where is the proper place to document this? As nesting is one of the fundamental new capabilities of Flux, ideally documenting this somewhere in some RFCs seems to make sense. But I'm struggling to find right RFCs.
As nesting is one of the fundamental new capabilities of Flux, ideally documenting this somewhere in some RFCs seems to make sense. But I'm struggling to find right RFCs.
Good point, thanks! I'm not sure either, maybe this could start as a wiki article and grow into a new RFC "Examples of Parent/Child Communication Strategies for Nested Flux Instances"?
Good point, thanks! I'm not sure either, maybe this could start as a wiki article and grow into a new RFC "Examples of Parent/Child Communication Strategies for Nested Flux Instances"?
Seems like the right approach. I will talk to @SteVwonder in case he wants to draft this since he actually has a hands-on on this topic.
Sure. I can begin writing up a wiki article on what cross-instance services I currently use as well as expand on some of the ideas expressed here. I'll ping you all for feedback once I have a draft.
Thanks @SteVwonder!
Here is my first attempt at the "Parent/Child Communication Strategies for Nested Flux Instances" wiki entry
All feedback and/or direct revisions are welcome.
Specifically, it would be great if @garlick, @grondo, or @chu11 could take a look at the What Is Generally Possible section and double check that I don't misrepresent the capabilities of Flux. More specifically, I want to ensure that these are accurate representations of Flux's current capabilities/limitations:
Children can send requests to and receive response from parent-provided services and subscribe to events and keepalives on parent overlays.
Parent instances cannot send requests to or receive responses from child-registered services and cannot subscribe to events or keepalives on child overlays.
@SteVwonder, thanks so much for the work on the writeup.
Some very minor comments -- probably more like questions at this point:
Second, child instances must be co-located with parent instances (i.e., run on the same node) in order to connect into the parent instance's overlay using the local connector.
The ssh
connector is also available to connect to a remote broker (however, to be useful passwordless ssh would probably need to be set up, so not sure if it is worth mentioning)
Flux supports limited push-communication via the guest KVS namespace outlined in RFC 16
I think the groundwork was laid for dynamic service registration (#1189), but I can't remember where we left the ability for a generic service registration via an external (local connector) API handle.
Another approach could be to fake parent-push by having the child send a registration message to a new scheduler service which would then use streaming RPC replies (to borrow @garlick's term) to push information to the child until it disconnected/deregistered. This is kind of kludgy and may not be any better than the KVS method you describe.
@grondo: thanks for the feedback.
The ssh connector is also available to connect to a remote broker (however, to be useful passwordless ssh would probably need to be set up, so not sure if it is worth mentioning)
I think the ssh
connector is at least worth mentioning since I already hint at the future tcp
connector. Besides requiring password-less ssh, do you know of any major theoretical differences between the current ssh
connector and the future tcp
connector? Presumably the tcp
connector will have a lower overhead?
Do you think it is worth mentioning the limit on the number of the external connectors that any given broker can support that I ran into in https://github.com/flux-framework/flux-core/issues/1151? Is this even still a limitation?
Another approach could be to fake parent-push by having the child send a registration message to a new scheduler service which would then use streaming RPC replies.
This is a nice idea. I'm doing a similar thing in the full example, except I requires a new request from the child after each reply. I like the idea of leveraging streaming RPC replies. I'll have to give that a shot sometime (and then document it).
Besides requiring password-less ssh, do you know of any major theoretical differences between the current ssh connector and the future tcp connector? Presumably the tcp connector will have a lower overhead?
Perhaps @garlick could chime in on this one. The tcp connector would certainly have less overhead and latency since IIRC the ssh connector hops through ssh and a local connector.
I don't remember the source of the limitation on connectors to a single broker... I would be surprised if there was a limit, but perhaps again @garlick can comment.
Here's an issue I opened recently on a tcp connector: flux-framework/flux-core#1281
It has fewer moving parts than the ssh connector (no ssh + sshd processes in the loop for example), so it should be more efficient.
There is no limit on broker connections that we enforce. System limits come into play at a certain point (max per-process open files for example).
Do we have an open issue for tracking generic service registration via an external (local connector) API handle
? I can't seem to find one. I can reference the PR that @grondo mentioned (https://github.com/flux-framework/flux-core/pull/1189), but referencing an issue would be nice too.
In the first picture, "Nested Instance #2" has two brokers that are not under control of the "Root Instance". How can that happen under the current flux model? Presumably broker 6 of root instance somehow launched 6, 7, 8 of Instance #2. But how did it run something on resources that it doesn't own?
Shouldn't the numbering be different for the subinstances as well? Both Nested Instance #1 and #2 should have brokers numbered 0, 1, 2, not use numbers from the parent, right?
@morrone thanks for the feedback.
How can that happen under the current flux model? Presumably broker 6 of root instance somehow launched 6, 7, 8 of Instance #2. But how did it run something on resources that it doesn't own?
I thought a while back there was talk about tying together multiple system Flux instances into a compute facility instance, without requiring the compute facility instance to run on every node of every cluster. Instead, the facility instance would just run on the login nodes of the clusters. I can't seem to find that discussion on GitHub, so maybe I'm just fabricating this idea. Does anyone else remember this discussion? Regardless, you bring up a good point. This isn't the normal use-case of nested Flux instances. I'll modify the example so that the nested instance #2 completely overlaps with the root instance. I'll point out the use of local
vs ssh
connectors using the sibling, nested instances (i.e., if nested instance #1 wants to message nested instance #2, it will need to use the ssh
connector)
Shouldn't the numbering be different for the subinstances as well? Both Nested Instance #1 and #2 should have brokers numbered 0, 1, 2, not use numbers from the parent, right?
Another good point. I wasn't intending those to be the ranks, but I can see how that is the intuitive interpretation. I will update the figure accordingly.
I thought a while back there was talk about tying together multiple system Flux instances into a compute facility instance, without requiring the compute facility instance to run on every node of every cluster.
Yes, there have been discussions of many possible approaches, but we haven't made any solid plans or design decisions yet. It is probably too soon to guess how we'll do it at this point.
Forking off from #110. We need to document how sub-instance uses services from a parent.
Since nesting is so fundamental to Flux, I can see us formalizing this in a new RFC and create cross references to other related RFCs. But if we don't have whole lot of things to say about this at this point, I can see us spread this to some existing RFCs along with man page argumentation. Thoughts?