chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 418 forks source link

pbs-aprun launcher - incorrect cpus-per-pe for aprun #6620

Open ben-albrecht opened 7 years ago

ben-albrecht commented 7 years ago

Summary of Problem

The CHPL_LAUNCHER=pbs-aprun launcher can launch Chapel programs with the incorrect number of cpus-per-pe (specified by the aprun -d flag).

This problem likely stems from pbs-aprun launcher relying on cnselect output to find the cpus-per-pe, without the context of the current interactive qsub allocation. In other words, the launcher is looking from a list of all the available nodes and picking one, rather than looking at the list of nodes currently allocated by qsub.

In my case, I was getting -d24 instead of -d44.

I assume this bug impacts any Cray XC40 running the PBS workload manager.

Execution command: ./foo -nl 1 -v (for an arbitrary Chapel program)

Configuration Information

(Probably not necessary, but might as well)

  1) modules/3.2.10.6                       6) ugni/6.0.13-3.29                      11) nodehealth/5.3.0-13.152               16) craype-broadwell                      21) totalview/2017.1.21                   26) atp/2.1.1
  2) alps/6.3.4-2.21                        7) gni-headers/5.0.9-3.37                12) system-config/3.3.2273-2.1            17) craype-network-aries                  22) moab/9.0.2-1469837953_f87b286-sles12  27) rca/2.1.6_g2c60fbf-2.265
  3) nodestat/2.2.70-2.98                   8) dmapp/7.1.1-39.37                     13) sysadm/2.3.114-2.28                   18) craype/2.5.11                         23) torque/6.0.2.h4                       28) perftools-base/6.5.0
  4) sdb/3.2.679-2.9                        9) xpmem/2.1.1_gf9c9084-2.38             14) Base-opts/2.3.117-2.7                 19) cray-mpich/7.6.0                      24) cray-libsci/17.06.1                   29) PrgEnv-intel/6.0.4
  5) udreg/2.3.2-7.54                      10) llm/21.2.432-2.6                      15) intel/17.0.4.196                      20) totalview-support/1.2.0.14            25) pmi/5.0.12                            30) fftw/3.3.4.11
gbtitus commented 7 years ago

Strictly speaking the cnselect returns a list of the available CPUs-per-node values rather than a list of the available nodes. But that's just a detail -- we still pick the smallest of those and that's not the right thing to do when we're already inside a WLM job which has a different value of CPUs-per-node, whether implicit or explicit.

ben-albrecht commented 6 years ago

We can get the list of nodes currently qsubbed with cat ${PBS_NODEFILE} (separated by newlines).

It's not clear to me if we can pass that node list to cnselect and get the available CPUs-per-node for that subset of nodes. Do you know if this should be possible @gbtitus?

gbtitus commented 6 years ago

cnselect can't do that but we could use other tools. For example:

[gbt@crystal:] nodes=$(<$PBS_NODEFILE) [gbt@crystal:] xtprocadmin -n $(echo $nodes | tr ' ' ',') --attrs cpus NID (HEX) NODENAME TYPE CPUS 1012 0x3f4 c5-0c0s13n0 compute 72 1013 0x3f5 c5-0c0s13n1 compute 72

ben-albrecht commented 6 years ago

So the launcher should gather the cpus-per-node for each node and then select the minimum for the aprun -d flag value.

One open question is if xtprocadmin and $PBS_NODEFILE are compatible across pbspro and moab/torque variants, as well across XE and XC.

gbtitus commented 6 years ago

I checked the qsub man page on a corporate XE running PBS and a corporate XC running Moab/Torque and both said that they set PBS_NODEFILE in the job's environment.

xtprocadmin is separate from the workload manager. It comes from the sdb module ("system database"?) rather than the WLM module. It is present on both systems I mentioned above.