Closed dwvisser closed 4 months ago
@dwvisser With respect to GPU support, please see this reference that may be useful.
@dwvisser With respect to GPU support, please see this reference that may be useful.
Awesome! To really test this, I am looking at it on my personal laptop (7-year old Dell that used to be my work laptop). It has caused me to dive into some GPU arcana. Apparently every time the Nvidia drivers on a Fedora system are updated (installation originally was its own arcane process), it is necessary to go through a manual process to sign the associated kernel module so SecureBoot will load it. Finally, though, I have succeeded at that first validation step from the article you just shared:
➜ nvidia-smi
Tue Nov 14 10:09:45 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce 940MX Off | 00000000:01:00.0 Off | N/A |
| N/A 46C P8 N/A / 200W | 2MiB / 2048MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 3055 G /usr/bin/gnome-shell 0MiB |
+---------------------------------------------------------------------------------------+
So...
My Fedora machine was never going to work, unless I went down the road of getting a VM up and running with GPU pass-through, a task in its own right.
My work laptop's Nvidia GPU, despite being newer, doesn't support CUDA.
I finally managed with some trial and error to run a local onprem model on my son's gaming machine (Windows 11 using WSL2-Ubuntu 22.04, GeForce RTX 3060). I recorded the steps I think were needed. I'm going make sure I am able to reproduce from my notes in the morning.
See https://github.com/dwvisser/onprem/blob/experiment-docker/wsl2-install.md for latest progress
So, I got it working in a Docker container, i.e., language model loads into GPU memory with BLAS=1
showing in stderr "log" output. Steps that work:
cuda_simple
using https://github.com/abetlen/llama-cpp-python/blob/main/docker/cuda_simple/Dockerfile (came from commit https://github.com/abetlen/llama-cpp-python/commit/22917989003c5e67623d54ab45affa1e0e475410)onprem_cuda_simple
):FROM cuda_simple
RUN python3 -m pip install onprem
CMD python3
--cap-add
and -e
options are, but am assuming they are very important):sudo docker run --rm -it -gpus=all --cap-add SYS_RESOURCE -e USE_MLOCK=0 \
-v /home/dale/onprem_data:/home/root/onprem_data -t onprem_cuda_simple
This is ready for test and review now
@mzientek would you have time to try out the new instructions and script I placed in docker/?
Good resource for when I'm fine-tuning the documentation for ingesting documents:
https://hugotkk.github.io/posts/docker-volume-mounting-permissions-and-ownership-explained/
I have confirmed that when onprem_data and its files are all owned by the current user, then the container accesses them just fine. I had issues the other day, when some of the files were owned by root:root
.
Next step (Monday): Have a standard location in the container for ingested documents, and provide it as a mapped volume. The docs should state any needed changes to webapp.yml as well as how to ingest documents relative to the volume mapped in the container launch command.
I've got this working pretty good at this point. I've put some additional "automagic" in onprem/init.py so that the workaround for the sqlite3 version thing is obvious. I was just running an onprem Streamlit instance out of a container on MCTL. I would like to make some final tests in the morning before I call this ready for possible merge.
After some additional testing this morning, I am happy with this now. If I uninstall pysqlite3-binary from the container, and try to start the web app, it gives a useful error message about what is needed.
@amaiya Please consider for merge.
Nice! I'll check it out and probably reach back out with questions. This looks great!
Also, thanks for adding this for seamlessness:
# Some older nvidia/cuda base images force this need on us.
# Try to appease chromadb expectation before it is loaded.
import sqlite3
if sqlite3.sqlite_version_info < (3, 35, 0):
import importlib
try:
importlib.import_module("pysqlite3")
except:
raise ImportError(
"Please install pysqlite3-binary: pip install pysqlite3-binary"
)
from sys import modules
modules["sqlite3"] = modules.pop("pysqlite3")
Iterating towards perfection…
It turns out, it is the major and minor version that matter when building the container image. I was assuming that just the major version had to match your host's driver. But the condition is slightly more subtle than that:
$ sudo docker run --rm -it --gpus=all --cap-add SYS_RESOURCE -e USE_MLOCK=0 -v ~/onprem_data:/root/onprem_data -p 8000:8000 onprem_cuda:11.8.0 onprem --port=8000
[sudo] password for dvisser:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.8, please update your driver to a newer version, or use an earlier cuda container: unknown.
I.e., I'll have to make the build script a little smarter, with a lookup table to ensure that it is pulling the latest known base image where CUDA major version matches, and CUDA minor version is <= host version. The simplest way would be to just use the x.0 versions, but I'd like it to pull the matching version if possible.
I had a few distractions today. I will do the above-described work tomorrow.
The automatic base image selection based on all version info is ready for testing by somebody else. It worked great on MCTL for me. I will also test it on my available home machines.
I will also test it on my available home machines.
Other than a very slow (possibly throttled at the server) download of the largest base image layer, the test on a home machine went flawlessly, with a CUDA-enhanced onprem web app successful served out of a container and asked questions I the web UI.
Testing out document corpus query capability from container. Hit this error message:
/usr/local/lib/python3.10/dist-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0
Indeed, these are the NVIDIA CUDA-related Python packages installed in the container:
root@b527f52643ed:/# pip list | grep cuda
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
For context, this is our 11.7.1 container that is running.
This seems very fixable. PyTorch needs to be installed before other stuff.
For CUDA 11.8:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
And this should work for CUDA 11.7:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
For CUDA 12.x, this, which we are effectively at present doing for all images:
pip3 install torch torchvision torchaudio
For CPU installs:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
The above can be unified with a INDEX_URL argument to the docker build, and realizing that the default index URL is https://pypi.org/pypi
@amaiya Now this branch is fully tested with running the Streamlit app from a container, along with instructions helping with ingesting documents and then querying them.
As for Docker Hub, I think it is a matter of peeling off a portion of README.md, and posting it as the description for the images. Another possibility I thought of, but haven't looked into, is using GitHub Container Registry to host the images. I'll look into what that would take.
FYI @amaiya
This works. I need to write up a "model card" worthy of display on Docker Hub that shows, e.g., how to map local directories into the container for onprem_data and SQLite database, map ports, etc. I will be working on that today