Open chu11 opened 3 years ago
If we want to shut down and restart brokers we probably will need the system
test personality eventually. I'm not sure it makes sense to assign fake resources though for testing the systemd exec functionality (it may break any use of properties like cpuset and memory cgroups)
For now I think testing actual job recovery is premature. The infrastructure in job-exec is not ready for that kind of testing. I think doing a test to ensure that jobs run normally under the testsdexec
implementation is a good start.
For testing libsdexec recovery functionality, I would still suggest a separate test program that launches "work" and on restart can optionally attach and wait for return codes from that "work". This will be better low level testing that the interfaces of libsdexec provide the functionality the job-exec module will eventually require.
While working on PR #3864, I wanted to write a simple test in which I would run jobs under the
testsdexec
job-exec module (launching jobs under systemd), kill the broker, and then make sure the jobs were still active under systemd (or if they finished, cached in systemd).While this simple test may be doable under the current "system" personality (setup via
test_under_flux
), it may prove difficult longer out when we want to do more advanced tests, such as assigning "fake resources" to ranks.I initially tried to do this by turning the "system" personality into a configuration via an environment variable and then just using the "job" personality. i.e.
however the system personality appears to also assign resources (see
make_bootstrap_config()
influx-sharness.sh
).Just putting up this issue to track ideas on how to do this longer out.
Per offline discussion w/ @garlick, the "system" personality for
test_under_flux
is probably not the right thing to do. "personalities" tend to be alternate loading of modules, while the "system" personality is more about just how the brokers are launched.