abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.07k stars 959 forks source link

Failed to run on Intel GPUs #1268

Open rnwang04 opened 7 months ago

rnwang04 commented 7 months ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Expected Behavior

I expect llama-cpp-python can normally run on Intel GPUs as llama.cpp do.

Current Behavior

llama-cpp-python fail to run on Intel GPUs while llama.cpp sycl backend can run normally.

Environment and Context

I test SYCL support on Intel Arc A770 GPU, with ubuntu 22.04 system, oneapi version is 2024.0 . I have verified llama.cpp sycl backend works normally on my machine.

Steps to Reproduce

conda create -n llm python=3.9
conda activate llm
source /opt/intel/oneapi/setvars.sh 
CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python
python test.py

while test.py is

from llama_cpp import Llama
llm = Llama(
      model_path="~/llama.cpp/models/7B/ggml-model-q4_0-pure.gguf",
      n_gpu_layers=33, # Uncomment to use GPU acceleration
      seed=1337, # Uncomment to set a specific seed
      # n_ctx=2048, # Uncomment to increase the context window
)
output = llm(
      "Q: Name the planets in the solar system? A: ", # Prompt
      max_tokens=32, # Generate up to 32 tokens, set to None to generate up to the end of the context window
      stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
      echo=True # Echo the prompt back in the output
) # Generate a completion, can also call create_completion
print(output)

Failure Logs

ggml_init_sycl: GGML_SYCL_DEBUG: 0
ggml_init_sycl: GGML_SYCL_F16: no
found 2 SYCL devices:
|ID| Name                                        |compute capability|Max compute units|Max work group|Max sub group|Global mem size|
|--|---------------------------------------------|------------------|-----------------|--------------|-------------|---------------|
| 0|         13th Gen Intel(R) Core(TM) i9-13900K|               3.0|               32|          8192|           64|    67181625344|
| 1|               Intel(R) FPGA Emulation Device|               1.2|               32|      67108864|           64|    67181625344|
DeviceList is empty. -30 (PI_ERROR_INVALID_VALUE)Exception caught at file:/tmp/pip-install-31terybs/llama-cpp-python_2e42ff812a094f19b998956fddc30615/vendor/llama.cpp/ggml-sycl.cpp, line:13341
ayttop commented 2 months ago

Same problem

ayttop commented 2 months ago

https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md