Closed jbfenris closed 3 years ago
Hi @jbhfenris, I havent seen this before. Can you share the Dockerfile of the failing pod? @knkski might know what to do in this case.
Hi @jbhfenris, I havent seen this before. Can you share the Dockerfile of the failing pod? @knkski might know what to do in this case.
The Dockerfile
FROM python:3.8-slim-buster ENV HTTP_PROXY http://proxy.rd.francetelecom.fr:8080 ENV IP_SERVER=0.0.0.0 ENV PORT_SERVER=8000 ENV NBR_COL=2 ENV NBR_LINE=1 RUN apt-get update && apt-get install -y libgl1-mesa-dev libglib2.0-0 && apt-get clean WORKDIR /usr/src/app ADD requirements-gpu.txt ./ RUN pip install --proxy=$HTTP_PROXY --no-cache-dir -r requirements-gpu.txt ADD . . CMD [ "bash", "-c", "python detect_mask_video_web.py --ip $IP_SERVER --port $PORT_SERVER -mW $NBR_COL -mH $NBR_LINE" ] EXPOSE $PORT_SERVER/tcp EXPOSE 5555/tcp
and the requirements-gpu.txt file : imagezmq tensorflow opencv-python numpy imutils argparse flask
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi, I'm facing an issue when I activate the GPU schedule on my deployment by adding : resources: limits: nvidia.com/gpu: 1
The pod returns as logs : python: error while loading shared libraries: libpython3.8.so.1.0: cannot open shared object file: No such file or directory
Without these configuration elements, it works perfectly but without GPU and the pod do its inference jobs
the pod image is based on python:3.8-slim-buster
The GPU add-on is installed and running (test done with CUDA example)
2020/10/07 13:38:48 Loading NVML 2020/10/07 13:38:48 Fetching devices. 2020/10/07 13:38:48 Starting FS watcher. 2020/10/07 13:38:48 Starting OS watcher. 2020/10/07 13:38:48 Starting to serve on /var/lib/kubelet/device-plugins/nvidia.sock 2020/10/07 13:38:48 Registered device plugin with Kubelet
The latest nvidia driver is installed :
NVIDIA-SMI 450.66 Driver Version: 450.66 CUDA Version: 11.0
microk8s is running the latest version (I make a fresh installation to be sure the issue doesn't come from that).
Where should I dig to solve this issue ?
inspection-report-20201007_162232.tar.gz