Prerequisites

Before submitting your issue, please ensure the following:

[x] I am running the latest version of PowerInfer. Development is rapid, and as of now, there are no tagged versions.
[x] I have carefully read and followed the instructions in the README.md.
[x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).

Expected Behavior

I tried to build the project and run a simple demo code as said in README.

Current Behavior

It built successfully, but report error when I tried to run the demo code.

offload_ffn_split: applying augmentation to model - please wait ...

CUDA error 222 at /home/test/test06/jdz/PowerInfer/ggml-cuda.cu:9635: the provided PTX was compiled with an unsupported toolchain.
current device: 0

Environment and Context

SDK version, e.g. for Linux:

Python 3.10.14
cmake version 3.30.1
g++ (conda-forge gcc 11.4.0-13) 11.4.0

Failure Information (for bugs)

I followed README, use cmake -S . -B build -DLLAMA_CUBLAS=ON, cmake --build build --config Release to build the project, and then run the command:

./build/bin/main -m /home/test/test06/jdz/PLMs/ReluLLaMA-7B/llama-7b-relu.powerinfer.gguf -n 128 -t 8 -p "Once upon a time" --vram-budget 8

I got the following logout

Log start

<!--skip some log-->

offload_ffn_split: applying augmentation to model - please wait ...

CUDA error 222 at /home/test/test06/jdz/PowerInfer/ggml-cuda.cu:9635: the provided PTX was compiled with an unsupported toolchain.
current device: 0

Steps to Reproduce

Just follow README.

Additional info

After running cmake -S . -B build -DLLAMA_CUBLAS=ON I got the following:

-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /home/test/test06/miniconda3/envs/jdz/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/test/test06/miniconda3/envs/jdz/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Found CUDAToolkit: /home/test/test06/cuda-12.4/targets/x86_64-linux/include (found version "12.4.131")
-- cuBLAS found
-- The CUDA compiler identification is NVIDIA 12.4.131
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /home/test/test06/cuda-12.4/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Using CUDA architectures: 52;61;70
GNU ld (GNU Binutils) 2.40
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (17.3s)
-- Generating done (4.5s)
-- Build files have been written to: /home/test/test06/jdz/PowerInfer/build

SJTU-IPADS / PowerInfer

Error: the provided PTX was compiled with an unsupported toolchain #229