benfred / implicit

Fast Python Collaborative Filtering for Implicit Feedback Datasets
https://benfred.github.io/implicit/
MIT License
3.57k stars 612 forks source link

IMPLICIT: No CUDA extension has been built, can't train on GPU #718

Open ngianni opened 4 months ago

ngianni commented 4 months ago

Hi! Im trying to run als model with gpu, but I get the following error:

ValueError: No CUDA extension has been built, can't train on GPU.

I also tryed to run it in google colab, but got the same error. It seems tha implicit.gpu.HAS_CUDA is always returning false. Any ideas?

Im running on Devian 11, and this is the nvidia-smi output:

+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 555.42.06 Driver Version: 555.42.06 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla T4 On | 00000000:00:04.0 Off | 0 | | N/A 38C P8 10W / 70W | 1MiB / 15360MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+

j-svensmark commented 3 months ago

I had a similar issue when trying to use Cuda 12. Cuda 11 works for me though.

I tried editing these lines to https://github.com/benfred/implicit/blob/main/implicit/gpu/__init__.py#L16-L17 to something like

except ImportError as e:
    print(f"{e}")

And got this error Import error libcublas.so.11: cannot open shared object file: No such file or directory when importing implicit. Looks like the cuda extension is specifically looking for cuda 11.

win845 commented 1 month ago

Confirm, that implicit can't find cuda 12 (but finds cuda 11)

Quick way to reprodude in docker

# Dockerfile
FROM nvidia/cuda:12.6.1-cudnn-runtime-ubuntu24.04
WORKDIR /app
ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=Europe/Berlin
ENV PATH="/opt/venv/bin:$PATH"

RUN gpg --keyserver keyserver.ubuntu.com --recv-keys F23C5A6CF475977595C89F51BA6932366A755776 && \
    gpg --export F23C5A6CF475977595C89F51BA6932366A755776 | tee /usr/share/keyrings/deadsnakes.gpg > /dev/null && \
    echo "deb [signed-by=/usr/share/keyrings/deadsnakes.gpg] https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) main" | tee /etc/apt/sources.list.d/deadsnakes.list

RUN apt update && \
    apt-get install -y --no-install-recommends  \
    curl libgomp1 \
    python3.11 python3.11-dev python3.11-venv 

RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
    python3.11 get-pip.py && \
    python3.11 -m venv /opt/venv && \
    rm get-pip.py

RUN pip install implicit
CMD python -c "import implicit; print(implicit.gpu.HAS_CUDA)"
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
docker build -t implicit -f Dockerfile .
docker run --gpus all -it implicit