Hi
We are facing an issue with incompatibility and we have been trying different UBUNTU versions. If I riun hello word in docker works and CUDA, took kit and drivers seem ok. I checked the libraries and those were fine (libnvidia,ml.so.1) however OCI runtime file. Any idea?
ubuntu@ip-172-31-17-183:~$ docker run --gpus all -d -p 80:80 -e HF_TOKEN=ZXXXX767398115161.dkr.ecr.us-east-1.amazonaws.com/predictionaws3:latest
7ac5d43c43301058d56b098d19ab6f36683d1bd617361e677a4b4acc77be3cf3
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
ubuntu@ip-172-31-17-183:~$ nvidia-smi
Tue Oct 29 05:49:19 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 20C P8 10W / 70W | 1MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
ubuntu@ip-172-31-17-183:~$ docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
The Docker client contacted the Docker daemon.
The Docker daemon pulled the "hello-world" image from the Docker Hub.
Hi We are facing an issue with incompatibility and we have been trying different UBUNTU versions. If I riun hello word in docker works and CUDA, took kit and drivers seem ok. I checked the libraries and those were fine (libnvidia,ml.so.1) however OCI runtime file. Any idea?
ubuntu@ip-172-31-17-183:~$ docker run --gpus all -d -p 80:80 -e HF_TOKEN=ZXXXX767398115161.dkr.ecr.us-east-1.amazonaws.com/predictionaws3:latest 7ac5d43c43301058d56b098d19ab6f36683d1bd617361e677a4b4acc77be3cf3 docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown. ubuntu@ip-172-31-17-183:~$ nvidia-smi Tue Oct 29 05:49:19 2024
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 | | N/A 20C P8 10W / 70W | 1MiB / 15360MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ ubuntu@ip-172-31-17-183:~$ docker run hello-world
Hello from Docker! This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
Docker container
libraries:libnvidia-ml.so, libnvidia-ml.so.1, libnvidia-ml.so.535.183.01, libnvidia-ml.so.550.127.05
FROM docker.io/nvidia/cuda:12.4.0-runtime-ubuntu20.04
Install Python and pip
RUN apt-get update && \ apt-get install -y python3 python3-pip && \ apt-get clean && \ rm -rf /var/lib/apt/lists/*
Set the working directory
WORKDIR /data
Copy input files and scripts
COPY md/1_medical.docx /data/input/ COPY md/1_genetic.csv /data/input/ COPY scripts/aws_md.py /data/scripts/ COPY requirements.txt /data/
Install required Python packages
RUN pip3 install --no-cache-dir -r requirements.txt
Set environment variables for input files (if needed)