CAST can enhance the system management of cluster-wide resources. It consists of the open source tools: cluster system management (CSM) and burst buffer.
Eclipse Public License 1.0
27
stars
34
forks
source link
CSM RHEL 8: new warning messages in CSM log files when with DCGM 2.X compared to DCGM 1.X #953
The name of the libdcgm.so.1 library has changed to libdcgm.so.2 with this release, resulting in the following warning:
[COMPUTE]2020-08-19 09:43:28.282258 csmd::warning | dlopen() /usr/lib64/libdcgm.so.1 returned: /usr/lib64/libdcgm.so.1: cannot open shared object file: No such file or directory
However, CSM still falls back to loading the library using the name libdcgm.so, which is successful.
The fields above are described like this in /usr/include/dcgm_fields.h:
/*
* NV Link Bandwidth Counter for Lane 0 - Not supported in DCGM 2.0
*/
#define DCGM_FI_DEV_NVLINK_BANDWIDTH_L0 440
/*
* NV Link Bandwidth Counter for Lane 1 - Not supported in DCGM 2.0
*/
#define DCGM_FI_DEV_NVLINK_BANDWIDTH_L1 441
/*
* NV Link Bandwidth Counter for Lane 2 - Not supported in DCGM 2.0
*/
#define DCGM_FI_DEV_NVLINK_BANDWIDTH_L2 442
/*
* NV Link Bandwidth Counter for Lane 3 - Not supported in DCGM 2.0
*/
#define DCGM_FI_DEV_NVLINK_BANDWIDTH_L3 443
/*
* NV Link Bandwidth Counter for Lane 4 - Not supported in DCGM 2.0
*/
#define DCGM_FI_DEV_NVLINK_BANDWIDTH_L4 444
/*
* NV Link Bandwidth Counter for Lane 5 - Not supported in DCGM 2.0
*/
#define DCGM_FI_DEV_NVLINK_BANDWIDTH_L5 445
Library name change
The name of the
libdcgm.so.1
library has changed tolibdcgm.so.2
with this release, resulting in the following warning:However, CSM still falls back to loading the library using the name
libdcgm.so
, which is successful.Fields not supported in DCGM 2.0 (type 1)
The fields above are described like this in
/usr/include/dcgm_fields.h
:Fields not supported in DCGM 2.0 (type 2)