If you are not tracking GPU usage, then SINGULARITY_CONTAINLIBS and --nv can be omitted.
These environment variables can be added to the XALT module file to automatically enable XALT tracking of workloads inside Singularity containers.
However, this exposes a few issues which are addressed by this pull request.
XALT is linked to libcrypto.so. However the path and version of this library varies depending on the Linux distribution. So depending on the operating system of the host and the container, there may be a mismatch resulting in a user job failure since libxalt_init.so cannot be loaded. E.g., a CentOS host and a Ubuntu container image. Add a configure option --with-staticLibs that will statically link libraries into XALT. Currently, this only triggers static linking with libcrypto and NVIDIA DCGM. You must configure with --with-staticLibs in order to confidently use XALT with Singularity containers.
The container image may not include the file utility. The ubuntu:16.04 container image is an example. XALT will segfault because it assumes there is output from this shell command. Add a check to ensure there is output before trying to access it.
Related to item 2, if the file utility is not present, then an error message is printed to stderr. Modify capture() to swallow anything written to stderr so the user does not see this.
With these changes:
$ SINGULARITYENV_XALT_TRACING=yes SINGULARITYENV_LD_PRELOAD=/tmp/usr/local/xalt/xalt/lib64/libxalt_init.so SINGULARITY_BINDPATH="/tmp/usr/local/xalt/xalt" SINGULARITY_CONTAINLIBS="/usr/lib64/libdcgm.so.1" singularity exec --nv ~/ubuntu1604.simg ~/peer
...
---------------------------------------------
Date: Thu Oct 11 14:12:38 2018
XALT Version: XALT 2.3.12
Nodename: ivb125
System: Linux
Release: 3.10.0-862.9.1.el7.x86_64
O.S. Version: #1 SMP Mon Jul 16 16:29:36 UTC 2018
Machine: x86_64
Syshost: psg
---------------------------------------------
myinit(LD_PRELOAD,/home/smcmillan/peer){
Test for __XALT_INITIAL_STATE__: "(NULL)", STATE: "LD_PRELOAD"
Test for XALT_EXECUTABLE_TRACKING: yes
Test for rank == 0, rank: 0
GPU tracing
-> XALT is build to track all programs, Current program is a scalar program -> Not producing a start record
}
max error: 1.192093e-07
max error: 1.192093e-07
myfini(LD_PRELOAD){
GPU tracing
4 GPUs detected
GPU 0: num compute pids 1
GPU 1: num compute pids 1
GPU 2: num compute pids 0
GPU 3: num compute pids 0
2 of 4 GPUs were used
-> Scalar Sampling program run_time: 0.480286: (my_rand: 0.376318 <= prob: 1) for program: /home/smcmillan/peer
len: 32, b64_cmd: WyIvaG9tZS9zbWNtaWxsYW4vcGVlciJd
Recording State at end of scalar user program:
LD_LIBRARY_PATH= PATH=/usr/bin:/bin /tmp/usr/local/xalt/xalt//libexec/xalt_run_submission --interfaceV 4 --ppid 24242 --syshost "psg" --start "1539267158.9848" --end "1539267159.4651" --exec "/home/smcmillan/peer" --ntasks 1 --uuid "b668d8ad-35cc-4039-8580-3e30338715cf" --prob 1 --ngpus 2 --path "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" --ld_libpath "/.singularity.d/libs" -- ["/home/smcmillan/peer"]
}
xalt_run_submission(zzz) {
Built envT
Extracted recordT from executable
Built userT, userDT
Filter envT
Parsed LDD
Using XALT_TRANSMISSION_STYLE: file
Built json string
Wrote json run file : /home/smcmillan/.xalt.d/run.psg.2018_10_11_14_12_38_9848.zzz.b668d8ad-35cc-4039-8580-3e30338715cf.json
}
You can inject XALT inside Singularity containers by doing the following:
If you are not tracking GPU usage, then
SINGULARITY_CONTAINLIBS
and--nv
can be omitted.These environment variables can be added to the XALT module file to automatically enable XALT tracking of workloads inside Singularity containers.
However, this exposes a few issues which are addressed by this pull request.
XALT is linked to
libcrypto.so
. However the path and version of this library varies depending on the Linux distribution. So depending on the operating system of the host and the container, there may be a mismatch resulting in a user job failure sincelibxalt_init.so
cannot be loaded. E.g., a CentOS host and a Ubuntu container image. Add aconfigure
option--with-staticLibs
that will statically link libraries into XALT. Currently, this only triggers static linking withlibcrypto
and NVIDIA DCGM. You must configure with--with-staticLibs
in order to confidently use XALT with Singularity containers.The container image may not include the
file
utility. Theubuntu:16.04
container image is an example. XALT will segfault because it assumes there is output from this shell command. Add a check to ensure there is output before trying to access it.Related to item 2, if the
file
utility is not present, then an error message is printed to stderr. Modifycapture()
to swallow anything written to stderr so the user does not see this.With these changes: