aws-samples / aws-parallelcluster-post-install-scripts

Scripts to customize AWS ParallelCluster
MIT No Attribution
23 stars 13 forks source link

Pyxis Install Script will not install Nvidia Container CLI if nvidia-smi is present. #37

Open codeknight03 opened 2 months ago

codeknight03 commented 2 months ago

The pyxis post install script is not installing Nvidia Container CLI in any case:

https://github.com/aws-samples/aws-parallelcluster-post-install-scripts/blob/main/pyxis/postinstall.sh#L45-L47

Due to code line,

if [ $GPU_PRESENT -eq 0 ] && [ $GPU_CONTAINER_PRESENT -gt 0 ]; then

It checks if $GPU_CONTAINER_PRESENT > 1 and then installs which is the case if nvidia-smi is not available but if nvidia-smi is available but container cli is not then installation does not take place.

codeknight03 commented 2 months ago

In production, I have made this change,

if [ $GPU_PRESENT -eq 0 ] && [ $GPU_CONTAINER_PRESENT -lt 0 ]; then

So for head node and login node the container CLI is not installed but it is installed on the worker nodes where nvidia-smi is present.