This repo contains ancillary information used to assist users of IBM Watson Machine Learning Community Edition. This repo will contain How To's, Readme's, Dockerfiles, etc. that can be consumed by users looking to get started.
BSD 2-Clause "Simplified" License
56
stars
54
forks
source link
Add gpucheck to see if base cpuset.mems is ready #182
Add a check to the script to tell if the system is initially at a valid
memory state. i.e. all gpu memory is onlined and added to cgroups. The
check uses lspci to gather the number of V100 gpus in the box, and
calculate the correct cpuset.mems. After the calculation a comparison
is made between the calculated cpuset.mems and the system created one.
If they match, we can safely assume all gpu nodes are onlined. If not a
message is printed out and a Return Code of 1 is generated.
Add a check to the script to tell if the system is initially at a valid memory state. i.e. all gpu memory is onlined and added to cgroups. The check uses lspci to gather the number of V100 gpus in the box, and calculate the correct cpuset.mems. After the calculation a comparison is made between the calculated cpuset.mems and the system created one. If they match, we can safely assume all gpu nodes are onlined. If not a message is printed out and a Return Code of 1 is generated.