Can't get inference running on GPU

TonsOfFun commented 1 year ago

First off, great work! I have loaded the yolov8n.onnx model from ultralytics and it runs on CPU no problem.

When attempting to run on my GPU I can see the CUDAExecutionProvider is available, but

require "onnxruntime"
require "mini_magick"
require "numo/narray"
OnnxRuntime.ffi_lib = "/app/onnxruntime-linux-x64-gpu-1.15.1/lib/libonnxruntime.so"

session = OnnxRuntime::InferenceSession.new "yolov8n.onnx"
session.providers

=> ["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"]

but when I run predictions it's only using the CPU

ankane commented 1 year ago

Hi @TonsOfFun, I think it's a bug with this library, but not sure when I'll be able to work on it. It looks like InferenceSession needs to call SessionOptionsAppendExecutionProvider_CUDA.

TonsOfFun commented 1 year ago

That's kinda what I was thinking thanks for the suggestion. I'll see if I can get something working and open a PR.

Also gonna try loading this PyTorch model with torch.rb. It looks like the CUDA support might be more readily available there. I'll report back here either way.

ankane commented 1 year ago

Hey @TonsOfFun, fixed in the commit above if you pass providers: ["CUDAExecutionProvider"].

TonsOfFun commented 1 year ago

This is awesome! I've been traveling, but I'll test it out tomorrow.

TonsOfFun commented 1 year ago

I did some testing and keep running into an error. I have tried a couple ways with onnxruntime-linux-x64-gpu-1.15.1 moving the contents into:

/var/lib/gems/3.0.0/bundler/gems/onnxruntime-ruby-abf11244043e/vendor/lib /var/lib/gems/3.0.0/bundler/gems/onnxruntime-ruby-abf11244043e/vendor/include

or putting the contents into:

libonnxruntime.so
libonnxruntime.so.1.15.1
libonnxruntime_providers_cuda.so
libonnxruntime_providers_shared.so
libonnxruntime_providers_tensorrt.so

/var/lib/gems/3.0.0/bundler/gems/onnxruntime-ruby-abf11244043e/vendor

OnnxRuntime.ffi_lib = "/var/lib/gems/3.0.0/bundler/gems/onnxruntime-ruby-abf11244043e/vendor/libonnxruntime.so.1.15.1"

Either way I seem to get this error:

model = OnnxRuntime::Model.new("yolov8n.onnx", providers: ["CUDAExecutionProvider"])

/var/lib/gems/3.0.0/bundler/gems/onnxruntime-ruby-abf11244043e/lib/onnxruntime/inference_session.rb:454:in `check_status': /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1131 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory (OnnxRuntime::Error)

I'm running this all on an image based on nvcr.io/nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04 that I've used for my python projects and nvidia-smi reports my cuda version and GPU. Let me know if anything stands out to you. I feel like it's probably something simple I'm missing.

ankane commented 1 year ago

It looks like the pre-built GPU version uses CUDA 11, so you'll need to either use that or try to compile it from source with CUDA 12.

TonsOfFun commented 1 year ago

That was it! I ended up using a modified version of the ml-stack image:

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04

ENV RUBY_VERSION 3.1.2

# install packages
RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get install -y --no-install-recommends \
        autoconf \
        automake \
        build-essential \
        imagemagick \
        libczmq-dev \
        libffi-dev \
        libreadline-dev \
        libsox-dev \
        libsox-fmt-all \
        libssl-dev \
        libtool \
        libvips \
        libyaml-dev \
        libzmq3-dev \
        make \
        python3 \
        python3-pip \
        python3-setuptools \
        sox \
        unzip \
        wget \
        git \
        zlib1g-dev \
        && \
    rm -rf /var/lib/apt/lists/*

# install Ruby
RUN cd /tmp && \
    wget -O ruby.tar.gz -q https://cache.ruby-lang.org/pub/ruby/3.1/ruby-$RUBY_VERSION.tar.gz && \
    mkdir ruby && \
    tar xfz ruby.tar.gz -C ruby --strip-components=1 && \
    rm ruby.tar.gz && \
    cd ruby && \
    ./configure --disable-install-doc --enable-shared && \
    make -j && \
    make install && \
    cd .. && \
    rm -r ruby && \
    ruby --version && \
    bundle --version

# install Jupyter
RUN pip3 install jupyter && \
    jupyter kernelspec remove -f python3 && \
    mkdir /root/.jupyter && \
    echo 'c.KernelSpecManager.ensure_native_kernel = False' > ~/.jupyter/jupyter_notebook_config.py

# install LibTorch
RUN cd /opt && \
    wget -O libtorch.zip -q https://download.pytorch.org/libtorch/cu113/libtorch-cxx11-abi-shared-with-deps-1.12.0%2Bcu113.zip && \
    unzip -q libtorch.zip && \
    rm libtorch.zip

RUN gem install \
        ffi-rzmq \
        iruby \
        mini_magick \
        numo-narray && \
    iruby register

WORKDIR /app

RUN bundle install
# COPY torch-gpu/MNIST.ipynb ./

CMD ["jupyter", "notebook", "--no-browser", "--ip=0.0.0.0", "--allow-root"]

With this Gemfile:

source "https://rubygems.org"

gem "numo-narray", platform: [:ruby, :x64_mingw]
# until released 0.7.8 is released with GPU support
gem "onnxruntime", git: 'https://github.com/ankane/onnxruntime-ruby.git', ref: 'abf1124' 
gem "mini_magick"

In case it's helpful to you or someone else. Thanks again for the support here!

ankane / onnxruntime-ruby

Can't get inference running on GPU #9