Closed garlick closed 6 months ago
Yes, I had been thinking something very similar to this. It might even be required to allow new processes to enter the same job container
BTW, when we had discussed this same idea before, it was yet another discussion that led to "flux broker == flux shell". I wonder if that idea requires another look before we keep getting ideas to add broker features to the shell.
These things worry me about that idea:
--standalone
notwithstanding).In short, I think the broker would need surgery to make it serve both roles, and in doing so, we'd likely find ourselves wanting to modernize it as well. We might also create something that is harder to maintain because of the different roles it must run in.
I think it may be more expedient in our current situation to factor out areas where we have duplicate code rather than try to develop one executable that works in both contexts. Happily libsubprocess is one place where such code is already abstracted into a library.
Sorry, I meant to more broadly consider the design of the shell more like a broker than just a process executor. In this case I was considering whether the shell should be able to route messages to the correct shell rank.
It was more of a general thought, not a specific call to use the existing broker code (sorry about that)
On libsubprocess, it's design was specifically meant so that it could be used by the shell to launch tasks dynamically (not called a shell at that time though). So no surprise I definitely agree with this idea
On Sat, Aug 10, 2019, 2:03 PM Jim Garlick notifications@github.com wrote:
These things worry me about that idea:
- broker is structured fundamentally to be the primary message router for an instance
- broker code presumes it runs as the instance owner and supports multi-user, while shell presumes it runs as guest. Dual role might get confusing?
- broker doesn't presume it has access to enclosing instance services, unlike shell (--standalone notwithstanding).
- broker is old and would look different if we wrote it now, yet it is stable and we are successfully building lots of things on top of it
In short, I think the broker would need surgery to make it serve both roles, and in doing so, we'd likely find ourselves wanting to modernize it as well. We might also create something that is harder to maintain because of the different roles it must run in.
I think it may be more expedient in our current situation to factor out areas where we have duplicate code rather than try to develop one executable that works in both contexts. Happily libsubprocess is one place where such code is already abstracted into a library.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/2298?email_source=notifications&email_token=AAFVEUQGWRA4CQSJUDPT7C3QD4UKJA5CNFSM4IKZ7H3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4AU6JY#issuecomment-520179495, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFVEUWROU5YNA7T5WEO3RTQD4UKJANCNFSM4IKZ7H3A .
It might be a little tricky to map the jobid + shell rank to the broker rank that the shell has registered its job-
service on (I think R would need to be parsed by flux exec).
For debugger support we need to build the job process table that maps hostnames to pids and taskids (#2163, flux-framework/rfc#187). Maybe we could generalize this and include broker ranks as well (or perhaps more generically, service address). In the rare case a tool or user wants to use the shell exec service, it would first request generation of the mpir proctable. (Just another idea. It might be more generically useful to have an R parsing library)
Over the weekend I coded up a proof-of-concept implementation of a shell "exec" plugin using a subprocess server. It was actually fairly straightforward. Main issues were:
libsubprocess hardcodes the "remote" subprocess service prefix name to cmb.rexec
. I ended up adding an optional command option to set an alternate service endpoint that would be used by flux_rexec(3)
, e.g. flux_cmd_setopt (cmd, "service", "shell-123456.rexec")
.
flux_subprocess_server_start(3)
registers message handlers internal with rolemask == 0
so I believe only FLUX_ROLE_OWNER messages could be handled by the subprocess server, defeating the goal here. We could allow the flags to be set on initialization by the caller, but unfortunately FLUX_ROLE_USER
lets in all users. What I ended up doing in the proof of concept was setting FLUX_ROLE_USER
by default, but change all message handlers to reject any message without FLUX_ROLE_OWNER or a userid matching the current user.
The shell guest exec plugin may want to add to the environment of spawned processes, but this isn't currently possible. Instead the environment would be set by the env
key of the request. I haven't addressed this one yet, but it would seem a callback that could be registered with the subprocess server would work here (and the hard-coded local_uri
member of the server struct could then be dropped.
As an aside to the 2nd bullet, it occurred to me it would be very convenient if there was some kind of rolemask like FLUX_ROLE_USER_ONLY
which only allowed messages from current uid and FLUX_ROLE_OWNER. This would be useful for user-registered services as in the shell (and might prevent accidental security holes) (Essentially we already have this in the shell as implemented in flux_shell_service_register(3)
, however it is not possible to use this function call for the subprocess server, which registers message handlers internally.)
Fixed by 99aa6be1eac361934a1af4da514667e55638bdeb
Could the flux-shell call
flux_subprocess_server_start()
and thereby offer the job owner a way to launch arbitrary tasks along side their job, and inside any container set up by the IMP, for debugging, monitoring, etc?Maybe the
flux exec
front end could then be modified to optionally accept a jobid, and then interpret the rank idset as shell ranks? Then you could do stuff likeIf eventually we had pty support in
flux exec
you could do something likeIt might be a little tricky to map the jobid + shell rank to the broker rank that the shell has registered its
job-<id>
service on (I thinkR
would need to be parsed byflux exec
). Other than that, if the "subprocess server" is ready to be embedded in the shell, it seems like this feature would mostly reuse existing work...