Open grondo opened 5 years ago
Related: #1039
Related to #1039, flux service load
would need a way to force an implementation even though it is not the highest priority provider, e.g. flux service load --force simple-sched sched
.
Final note: one other idea is that all services currently loaded in rc1 could be split into service files. Each service script would flux service load
the services it depends on. Then the "rc" script for an instance running the default scheduler would just be:
flux service load sched
And the flux service command would take care of loading services in correct order. If your instance just needed kvs (e.g. in testing) you could potentially initialize with just flux service load kvs
It would be good to figure this one out sooner rather than later.
What about our TOML config? Could we potentially have TOML fragments for each module that expresses dependencies, default options, etc? Then flux-sched could provide one for its install prefix?
I like the idea of a higher level command to deal with modules + dependencies, sort of like modprobe(8). Good candidate for implementation in python IMHO.
But seems like wasted activity startup, and also results in errors in rc3 (unless flux-sched reloaded sched-simple in its stop scripts, which seems silly)
For now, I can load ached-simple in manager's stop scripts to shut up the error message.
I don't know what the right solution would be here. But I can say it would be pretty important to make it easy to specialize our scheduling behaviors at different levels.
Ultimately, I can see at the top level we will run the conservative policy on a pretty fine grained resource graph but at a child instance live we run a HTC oriented policy on a coarse grained graph, for instance.
Right now, if the scheduler configuration users want is different than what rc scripts offer, they have to unload and reload qmanager/resource with different parameters. (or use a kludge NOOP environment variable trick, which would be error prone with nesting)
If we can do this such a way that users can effect this scheduling specialization without unloading/reloading (or using some kludge NOOP environment variable trick which is error prone wit nesting), this will be ideal.
Ideally, a newly loaded scheduler could "take over" an existing scheduler via some protocol, and we could leave sched-simple loaded. However, I think we stopped short of providing that support in dynamic registration.
A couple of recent developments would make this challenging nowadays for schedulers:
resource.acquire
to obtain and monitor resources. If two schedulers do that, then the first one wins and the second one fails. We'd have to go with a slightly different semantic for resource aquisitionAck! I wanted to say more but I just realized I'm late for an appointment!
A couple of recent developments would make this challenging nowadays for schedulers:
Good points. I think the idea of scheduler take-over was good at the time and would be convenient. But you are correct that design shouldn't be considered anymore.
@grondo said in #2946
How do you enforce order of sched module loading (if that happens to be required)?
Ah, well that was a dumb question, since the proposed config key is an array. Sorry not thinking clearly today I suppose.
Thinking about use cases, it would be nice if there was a way to encapsulate the scheduler choice into a single string, eg.
$ flux start -o,-S sched=fluxion
instead of
$ flux start -o,-S,sched.modules=sched-fluxion-qmanager,sched-fluxion-resource
Which also gives the user the opportunity to cause modules to load in the wrong order, if order matters.
Could each scheduler (or other replaceable service) provide a toml config, with enough info to load itself, into a well known location under a specific name. Then with flux-start
or flux-broker
we add an option to select from these named configs?
In fact, since reading a toml table will override the previous table, would it work to have a default
[sched]
modules = [ 'sched-simple' ]
Then if another sched
config is selected, the default is overridden?
Selecting the scheduler by a single name and hiding the details of module loading seems good!
I'm not seeing how flux-sched (say) could install a TOML fragment someplace that gets pulled in conditionally. Have to ponder that for a bit I think. Could the TOML config reference a script provided by flux-sched? Then at least the script is conditionally invoked rather than being just another rc fragment that gets invoked unconditionally...
I'm not seeing how flux-sched (say) could install a TOML fragment someplace that gets pulled in conditionally. Have to ponder that for a bit I think. Could the TOML config reference a script provided by flux-sched?
An rc script, conditionally loaded by name via broker attribute (or some other flux-start
option) would be even better. I had only referenced a config file since I was following the initial idea in #2946.
However, loading config fragments from the flux-start/broker command line may be very useful as well, so it would be nice if we could support that as well. Especially if tables could be updated instead of overwritten.
For example, an advanced scheduler may have many tunable parameters. Once a workflow user has determined the right configuration for a scheduler, it would be nice if they could drop a TOML config in their homedir and reference that on the command line when starting an instance.
Or, a site could provide a few different "named" scheduler configurations which could be selected at runtime by a single string. (I guess this could also be accomplished by multiple rc scripts though) This isn't just applicable to the scheduler config, either (I'm thinking content-store, job-archive, etc)
I don't remember exactly where TOML config is loaded by broker, but if it is loaded early, could the broker use the following steps to allow config fragments to be pulled in conditionally?
sysconfdir/flux/configs:~/.flux/configs
Users could also have a ~/.flux/configs/default/*.toml
files that are always loaded for their own instances.
Apologies if the above extemporaneous description is ill conceived. I just wanted to throw an idea out there that described my high-level thoughts on the matter.
That seems like it solves a lot of problems! I like it!
One point (neither here nor there really): currently there is no default config unless a user sets --config-path
or FLUX_CONF_DIR env var. The systemd unit file sets --config-path=sysconfdir/flux/system/conf.d
but the default is an empty config object. If we keep it that way, it just skips step 1 above and makes all config loading explicit, which seems OK to me.
A refinement might be to add support for something resembling an "include" directive so that configs could reference other named configs?
Edit: See also https://github.com/toml-lang/toml/issues/36
Currently, flux-sched has to
flux module remove sched-simple
before loading theqmanager
since both modules have to register thesched
service, engagejob-manager
in hello protocol, etc. This works fine, but seems like wasted activity startup, and also results in errors inrc3
(unless flux-sched reloaded sched-simple in its stop scripts, which seems silly)Ideally, a newly loaded scheduler could "take over" an existing scheduler via some protocol, and we could leave
sched-simple
loaded. However, I think we stopped short of providing that support in dynamic registration.Perhaps for now we could move the load of sched-simple to its own rc script in flux-core, named based on the provided service:
sched
, and then do something like alternatives to link to the current provider fromrc1.d/sched
?flux-sched
would then update the/etc/flux/rc1.d/sched
link to point to its alternative rc1 script?Yeah, I agree, not the greatest approach...
Maybe we need a higher level service than loading single modules that can load "services" which are provided by name from scripts outside of the
rc1.d/*
directory. Theflux service load
(or whatever) command would load configuration from a/etc/flux/services/*
directory. Each package that provides a named service drops a config entry into this directory and the last entry loaded wins (so99-sched-fluxion
would override00-sched-simple
for example).Instead of calling
flux module load sched-simple
the flux-corerc1
script(s) would instead useflux service load sched
and let theflux-service
command handle calling the right script. Similarly, the service config could denote arc3
script for each service provider which would be called fromflux service remove/unload
.