Open palmoreck opened 4 years ago
Check:
https://github.com/NVIDIA/k8s-device-plugin#running-gpu-jobs
It says:
WARNING: if you don't request GPUs when using the device plugin with NVIDIA images all the GPUs on the machine will be exposed inside your container.
It was seen after doing tests that is not necessary to distinguish between having next line:
https://github.com/CONABIO/kube_sipecam/blob/master/deployments/audio/kale-jupyterlab-kubeflow_0.4.0_1.14.0_tf.yaml#L35
and don't have it in deployment:
https://github.com/CONABIO/kube_sipecam/blob/master/deployments/audio/kale-jupyterlab-kubeflow_0.4.0_1.14.0_tf_cpu.yaml#L32
At least using the example for torch:
https://github.com/CONABIO/kube_sipecam_playground/tree/issue-1/audio/notebooks/dockerfiles/tf_kale/0.4.0_1.14.0_tf/cifar10
the kubeflow+kale run was successful
So I either could delete file
https://github.com/CONABIO/kube_sipecam/blob/master/deployments/audio/kale-jupyterlab-kubeflow_0.4.0_1.14.0_tf_cpu.yaml
or use this file to compile notebook via kale and avoid having problems in kubernetes for not finding nodes with gpu's (because stablishing inside limits block the paremeter
nvidia.com/gpu: 1
causes this message)