canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.55k stars 773 forks source link

GPU issue with python code using tensorflow-gpu #1629

Closed jbfenris closed 3 years ago

jbfenris commented 4 years ago

Hi, I'm facing an issue when I activate the GPU schedule on my deployment by adding : resources: limits: nvidia.com/gpu: 1

The pod returns as logs : python: error while loading shared libraries: libpython3.8.so.1.0: cannot open shared object file: No such file or directory

Without these configuration elements, it works perfectly but without GPU and the pod do its inference jobs

the pod image is based on python:3.8-slim-buster

The GPU add-on is installed and running (test done with CUDA example) 2020/10/07 13:38:48 Loading NVML 2020/10/07 13:38:48 Fetching devices. 2020/10/07 13:38:48 Starting FS watcher. 2020/10/07 13:38:48 Starting OS watcher. 2020/10/07 13:38:48 Starting to serve on /var/lib/kubelet/device-plugins/nvidia.sock 2020/10/07 13:38:48 Registered device plugin with Kubelet

The latest nvidia driver is installed : NVIDIA-SMI 450.66 Driver Version: 450.66 CUDA Version: 11.0

microk8s is running the latest version (I make a fresh installation to be sure the issue doesn't come from that).

Where should I dig to solve this issue ?

inspection-report-20201007_162232.tar.gz

ktsakalozos commented 4 years ago

Hi @jbhfenris, I havent seen this before. Can you share the Dockerfile of the failing pod? @knkski might know what to do in this case.

jbfenris commented 4 years ago

Hi @jbhfenris, I havent seen this before. Can you share the Dockerfile of the failing pod? @knkski might know what to do in this case.

The Dockerfile

FROM python:3.8-slim-buster ENV HTTP_PROXY http://proxy.rd.francetelecom.fr:8080 ENV IP_SERVER=0.0.0.0 ENV PORT_SERVER=8000 ENV NBR_COL=2 ENV NBR_LINE=1 RUN apt-get update && apt-get install -y libgl1-mesa-dev libglib2.0-0 && apt-get clean WORKDIR /usr/src/app ADD requirements-gpu.txt ./ RUN pip install --proxy=$HTTP_PROXY --no-cache-dir -r requirements-gpu.txt ADD . . CMD [ "bash", "-c", "python detect_mask_video_web.py --ip $IP_SERVER --port $PORT_SERVER -mW $NBR_COL -mH $NBR_LINE" ] EXPOSE $PORT_SERVER/tcp EXPOSE 5555/tcp

and the requirements-gpu.txt file : imagezmq tensorflow opencv-python numpy imutils argparse flask

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.