idea: update RFC5/Module Extension Protocol definition to allow for non-DSO modules

grondo commented 8 years ago

In an attempt to support Lua and/or LuaJIT modules natively in the broker, it became apparent that the module definition as implemented and expressed in RFC5 needs an update to support any kind of non-DSO style modules, i.e. interpreted language modules and/or modules as separate processes.

There are (at least) two ways to support interpreted/separate process modules, and we should decide which makes the most sense before drafting a new version of RFC5. This issue is meant to be a forum for that discussion, which will hopefully be quick. Description of the two approaches follows:

Interpreted modules, e.g. a module written in Lua or Python, can be loaded via specific comms modules using the existing interfaces (i.e. similar to @trws's old pymod). The usage would require the actual module be listed in the module command line, with extra module options appended thereon. For example flux module load pymod /path/to/module.py or flux module load luamod /path/to/module.lua.
- Advantages:
- more easily implemented
- less code changes
- less RFC changes
- keep one simple module loading interface in broker
- Disadvantages
- module-as-process not easily supported (without extra comms hop anway)
- less flexible
The module loading interface in the broker could be abstracted and extended into multiple module "loaders", which could be chosen at runtime when loading a module based on file extension (e.g. dso_loader for .so, lua loader for .lua, python loader for .py), or perhaps magic number as in the kernel's binfmt_misc. Setup of the zmq sockets and pthread for module comms and control could also be abstracted to allow for a module-as-process module loader in the future. The module loader code could itself be abstracted into a DSO which is loaded as needed, perhaps similar to the connector plugins scheme.
- Advantages:
  - script modules can be loaded directly with flux module load /path/to/script without naming the specific module loading module on the cmdline (e.g. luamod, pymod, etc.)
  - more flexibility, e.g. if more of module setup is pushed into module loader implementation
  - more easily support module-as-process without an indirect hop through local comms module
- Disadvantages
  - More work, extra management of module loader plugins
  - More RFC rework
  - Increase complexity of broker
  - Maybe break or make more difficult the recursive module "find"
In both cases, the mod_name and mod_service symbols required in Extension Module Symbols definition need to be either extended or removed. In case 1, the module name, e.g. "pymod" or "luamod" etc, will be different than the "module service name" (hopefully I'm using the correct term here), which will be provided by the script implementing the module. In case 2, a more generic definition of "Extension Module" as something beyond only a DSO will be required, and interpreted language scripts or separate processes cannot very well define these symbols.

If we were to implement scheme 2 above, I would probably suggest removal of all parts of the RFC that describe or depend on Extension Modules as DSOs specifically, and would instead define a specification of a "Module Loader Implementation" which would, in turn, be responsible for getting the module name and service (and provide any "main" function) from the module it will load, whatever form that module may be in.

trws commented 8 years ago

2 would be fantastic. In fact, it would be good even for compiled languages that produce so files but don't have an easy way to offer an externed C-compatible string constant (go for example, perhaps rust, swift etc).

garlick commented 8 years ago

Your second option seems very appealing. I favor that one.

I really like the idea of a loader as a DSO handled like connectors. I wonder if we could come up with something like a flux_popen() that returns a flux_t that could be incorporated into a Flux reactor loop? Nice testing possibilities there, and would really simplify the broker's code too.

grondo commented 8 years ago

I wonder if we could come up with something like a flux_popen() that returns a flux_t that could be incorporated into a Flux reactor loop?

That idea is very intriguing! However, I'm not sure I completely follow... do you mean that flux_popen() would be the interface the broker uses to load and run a "module" (and thus since it is an API interface could also be used to write test cases that exercise individual modules)? The one place I'm stuck is how a flux_t would be used by the broker to implement the broker-module interface, but that is probably because I'm not very familiar with the code down there yet.

If the broker communicated with modules using a flux handle similar to how API users and modules communicate with the broker, that seems like it could be a very powerful abstraction...

garlick commented 8 years ago

Yeah, I was thinking the broker could use it too. The broker just needs to be able to route messages down paths based on routing frames or service names. Right now it's a raw zeromq socket, but it could also be a flux_t.

This is a similar idea to the czmq "zactor" class if I recall - we might want to review how that works.

grondo commented 8 years ago

Ok, great! That seems like a very good idea. Extension Module Loaders will need to return more than just a reactor and zmq socket though (e.g. service, name, "main" function, "exit" funciton, etc) so I wonder if there will need to be an object returned that wraps the flux handle, or the reverse I guess -- this info could be captured within the handle somehow.... Otherwise, perhaps something like this interface might be what you are thinking?

flux_module_t * flux_module_load (const char *path);
void flux_module_unload (flux_module_t *m);
flux_t *flux_module_run (flux_module_t *m);

// Accessors (or flux_module_get (m, ITEM, &val)...)
const char *flux_module_type (flux_module_t *m);
const char *flux_module_name (flux_module_t *m);
const char *flux_module_service (flux_module_t *m);
const char *flux_module_digest (flux_module_t *m);
int flux_module_size (flux_module_t *m);

Something like flux_popen could be just flux_module_load(); flux_module_run() with flux_module_t registered with the handle with flux_aux_set(), but there are times when you might want to load a module without "starting" it (e.g. query its name).

This is a similar idea to the czmq "zactor" class if I recall - we might want to review how that works.

I looked at zactor and it seems to just pass a zmq pipe to a function running a thread, and implements send/recv on it. Not sure if that is helpful here...

garlick commented 8 years ago

Looks like a good start!

Might want a separate "stop" and unload function, since unloading a module consists of sending it a shutdown request, and allowing it to possibly send and receive some messages while it shuts down, then joining the thread (or wait for process I guess).

Yeah, zactor is pretty simple so not much help.

grondo commented 8 years ago

One thing I'm stumbling on is that the RFC covers both messaging actor modules and so-called "plugin extension modules", the latter of which are currently used in sched.

It would be nice to offer a loader abstraction for both messaging actors and plugin-style modules, however supporting plugin loading in a generic way might be more difficult since plugins typically have arbitrary name and number of entry points. Is it ok to make the description of Extension Module Loader specific to "actor" type extension modules, which will have defined entry points (as was the case before)?

The Module Management Protocol would still apply to both types of extension modules.

garlick commented 8 years ago

That seems OK to me. Maybe the plugin-style modules could have a different interface that accepts an array of symbol names and function pointers? But that doesn't need to happen now.

grondo commented 8 years ago

Maybe the plugin-style modules could have a different interface that accepts an array of symbol names and function pointers? But that doesn't need to happen now.

That could work, but but I think each plugin type would have to have its own set of loaders because, except in the DSO case, "symbols" don't really exist and would have to be mapped to pre-defined functions in the loader implementation to do the translation. Maybe that wouldn't be so bad if these plugin loaders could inherit from a "module loader" base type? I'll try to keep that in mind as we work on the module loader interface.

For plugins, it might also make more sense to just have plugin loading plugins (e.g. a lua.so plugin that loads a lua script plugin). I'm not sure if it is as critical to abstract the loader here since they only present a set of call-sites, they don't have to register a service, potentially open channels of communications, run in their own thread of control, etc.

garlick commented 8 years ago

Yeah, sounds like we should focus on the "actors" here.

trws commented 3 months ago

Ended up looking back at this after trying to work out how to set a module's name from inside the module for multi-loaded modules. AFAICT we support different names being set by flux module load but not by the module itself by any means, is that correct? Discussion also came up that having a helper for "whatever my module name is" for registering service names.

garlick commented 3 months ago

we support different names being set by flux module load but not by the module itself by any means, is that correct?

Yes

Discussion also came up that having a helper for "whatever my module name is" for registering service names.

We have

int flux_msg_handler_addvec_ex (flux_t *h,
                                const char *service_name,
                                const struct flux_msg_handler_spec tab[],
                                void *arg,
                                flux_msg_handler_t **hp[]);

The handler table then just registers the method names leaving off the service name. Example here.

RFC 5 explains how to determine the module name from within the module:

flux_aux_get (h, "flux::name")

flux-framework / rfc

idea: update RFC5/Module Extension Protocol definition to allow for non-DSO modules #68