flux-framework / flux-coral2

Plugins and services for Flux on CORAL2 systems
GNU Lesser General Public License v3.0
9 stars 7 forks source link

libpals: provide proper `cpus_per_pe` #29

Open jameshcorbett opened 2 years ago

jameshcorbett commented 2 years ago

According to David Gloe at HPE, the cpus_per_pe member of the pals_cmd_t struct should be "the number of Linux CPUs (hyperthreads) each PE [i.e. shell task] is bound to." He also noted that

Overlapping isn't taken into account at all for this. So for example if there are 4 PEs all bound to the same 2 hyperthreads, cpus_per_pe should be set to 2.

The value the flux-coral2 shell plugin provides is not correct because it uses the notion of hwloc cores, not Linux CPUs, and it does account for overlap.

@grondo noted that

The distribution of cores to tasks is done in the shell affinity plugin, and as such it is not exported to other plugins at the moment... if you attempt to recreate it, you'd have to use the same code

Thankfully it seems that an improper value (e.g. 0) for the cpus_per_pe entry does not cause any errors, and David Gloe guessed that the value isn't read anywhere. So we can punt on the issue for a little while.

grondo commented 2 years ago

Just to clarify, by "shell tasks" you mean the user tasks that the job shell launches, not the job shell itself correct?

jameshcorbett commented 2 years ago

Yeah, sorry. I was looking for the word and didn't want to say "MPI rank". I should have just said "task" since that's the standard Flux terminology.