flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
167 stars 50 forks source link

system test environment cannot set up fake resources #3891

Open chu11 opened 3 years ago

chu11 commented 3 years ago

While working on PR #3864, I wanted to write a simple test in which I would run jobs under the testsdexec job-exec module (launching jobs under systemd), kill the broker, and then make sure the jobs were still active under systemd (or if they finished, cached in systemd).

While this simple test may be doable under the current "system" personality (setup via test_under_flux), it may prove difficult longer out when we want to do more advanced tests, such as assigning "fake resources" to ranks.

I initially tried to do this by turning the "system" personality into a configuration via an environment variable and then just using the "job" personality. i.e.

export TEST_UNDER_SYSTEM_SETUP=y
test_under_flux 4 job

however the system personality appears to also assign resources (see make_bootstrap_config() in flux-sharness.sh).

Just putting up this issue to track ideas on how to do this longer out.

Per offline discussion w/ @garlick, the "system" personality for test_under_flux is probably not the right thing to do. "personalities" tend to be alternate loading of modules, while the "system" personality is more about just how the brokers are launched.

grondo commented 3 years ago

If we want to shut down and restart brokers we probably will need the system test personality eventually. I'm not sure it makes sense to assign fake resources though for testing the systemd exec functionality (it may break any use of properties like cpuset and memory cgroups)

For now I think testing actual job recovery is premature. The infrastructure in job-exec is not ready for that kind of testing. I think doing a test to ensure that jobs run normally under the testsdexec implementation is a good start.

For testing libsdexec recovery functionality, I would still suggest a separate test program that launches "work" and on restart can optionally attach and wait for return codes from that "work". This will be better low level testing that the interfaces of libsdexec provide the functionality the job-exec module will eventually require.