IBM / powerai

This repo contains ancillary information used to assist users of IBM Watson Machine Learning Community Edition. This repo will contain How To's, Readme's, Dockerfiles, etc. that can be consumed by users looking to get started.
BSD 2-Clause "Simplified" License
56 stars 54 forks source link

Add gpucheck to see if base cpuset.mems is ready #182

Closed dllehr81 closed 4 years ago

dllehr81 commented 4 years ago

Add a check to the script to tell if the system is initially at a valid memory state. i.e. all gpu memory is onlined and added to cgroups. The check uses lspci to gather the number of V100 gpus in the box, and calculate the correct cpuset.mems. After the calculation a comparison is made between the calculated cpuset.mems and the system created one. If they match, we can safely assume all gpu nodes are onlined. If not a message is printed out and a Return Code of 1 is generated.