flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
159 stars 49 forks source link

Better support for Cray ATP plugin and other CTI tools #5697

Open grondo opened 5 months ago

grondo commented 5 months ago

While answering questions about job shell and jobtap plugins for @ardangelo, it became apparent that there is some awkwardness in the implementation vs some other RMs because Flux does not have a frontend process associated with jobs. To summarize, the ATP plugin has the following basic requirements:

Ideally, this could all be done within a job shell plugin, but currently this is not possible because use of flux fllemap map to map and broadcast files is only available on rank 0, and since not all jobs will be running on rank 0 the file broadcast is not possible.

As a workaround, @ardangelo and I have come up with a solution that uses a combination of jobtap plugin (which of course is always running on rank 0) and shell plugin. Unfortunately, the jobap plugin needs to access the job's environment to determine if ATP is enabled, and this environment is redacted in the job manager's copy of jobspec, so the jobtap plugin has to first fetch J for every job to determine if ATP is enabled (and to prepare for getenv() support). The jobtap plugin then dlopens the Flux ATP helper library (derived from ATP_INSTALL_DIR in the job's environment), directs getenv calls to use the jobspec environment dictionary, and calls the init function. This allows the temporary files and broadcast to always occur on rank 0.

In order to set environment variables, however, the jobtap plugin will have to place these variables in an eventlog event, and a job shell plugin will have to read that event and set the variables accordingly.

This design works, but is obviously complicated by the requirement to use the jobtap plugin to run the ATP service on rank 0. It would be much nicer if this service could be run on any rank, and the solution could be fully implemented in a job shell plugin.

grondo commented 5 months ago

Also, something like #5605 could be used in the future instead of alloc-bypass.so and would not require the privilege to load a jobtap plugin.

garlick commented 5 months ago

Ideally, this could all be done within a job shell plugin, but currently this is not possible because use of flux fllemap map to map and broadcast files is only available on rank 0, and since not all jobs will be running on rank 0 the file broadcast is not possible.

I'll open a separate issue on making this work.