Prerequisites

Please answer the following questions for yourself before submitting an issue.

[x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[x] I carefully followed the README.md.
[x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[x] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I have tried to install SYCL following the instructions and expected it to work I guess.

Environment and Context

CPU: Ryzen 5 2600 GPU: Intel Arc A750 RAM: 32 GB 2993 Mhz OS: Windows 11 Pro (x64) 23H2 Display Driver: Intel® Graphics Driver 31.0.101.5186/31.0.101.5234 (WHQL Certified)

Python: Python 3.11.8 CMake: CMake 3.28.3 Microsoft Visual Studio 2022 version 17.9.0 Intel OneAPI version 2024.0.0 MinGW32 GNU C and C++ Compiler version 6.3.0-1

First of all, I have installed and used llama-cpp-python[server] using Vulkan and CLBlast. However, they at most used 20% of the VRAM and the performance wasn't satisfactory. (I have tried using several different quantized models including mistral-7b-instruct-v0.2 and Wizard-Vicuna-7B)

That's why I have decided to try SYCL to make the most of my hardware. I downloaded OneAPI and faced my first roadblock not long after. I simply couldn't use the "source /opt/intel/oneapi/setvars.sh" command. First of all, there were no setvars.sh file in my OneAPI installation. I had setvars.bat in the "C:\Program Files (x86)\Intel\oneAPI" directory and tried to use it similarly to this command but it didn't work. That is when I found this Intel Guide page.

I have started the OneAPI environment according to the instructions in that page (cmd.exe "/K" '"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" && powershell') and started building after CMake arguments: "CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx". Unfortunately, it failed and the system gave the error below:

CMake Error at vendor/llama.cpp/CMakeLists.txt:511 (find_package):
        Found package configuration file:

          C:/Program Files (x86)/Intel/oneAPI/compiler/latest/lib/cmake/IntelSYCL/IntelSYCLConfig.cmake

        but it set IntelSYCL_FOUND to FALSE so package "IntelSYCL" is considered to
        be NOT FOUND.  Reason given by package:

        Unsupported compiler family MSVC and compiler C:/Program Files/Microsoft
        Visual
        Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe!!

I have solved this problem by downloading MinGW and adding "MinGW Makefiles" before compiling. Which gave me a new error that said the icpx compiler was broken. (Probably because I am on Windows) That is why I changed the CMake arguments to "CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx".

After all that effort (I am pretty new to all of this) it finally compiled. I opened another Powershell window and happily started the server and this time I faced another new error:

Traceback (most recent call last):
  File "C:\Users\Bedirhan\AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_cpp\llama_cpp.py", line 74, in _load_shared_library
    return ctypes.CDLL(str(_lib_path), **cdll_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bedirhan\AppData\Local\Programs\Python\Python311\Lib\ctypes\__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: Could not find module 'C:\Users\Bedirhan\AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_cpp\llama.dll' (or one of its dependencies). Try using the full path with constructor syntax.

Now, if I start another OneAPI environment using the "cmd.exe "/K" '"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" && powershell'" command and then try starting the server again it actually starts. However, it never completes the startup and gives "OSError: exception: access violation reading 0x0000000000000020" error at the end. I am adding the full log of this last error below.

Failure Logs

:: initializing oneAPI environment...
   Initializing Visual Studio command-line environment...
   Visual Studio version 17.9.0 environment configured.
   "C:\Program Files\Microsoft Visual Studio\2022\Community\"
   Visual Studio command-line environment initialized for: 'x64'
:: advisor -- processing etc\advisor\vars.bat
:: compiler -- processing etc\compiler\vars.bat
:: dal -- processing etc\dal\vars.bat
:: debugger -- processing etc\debugger\vars.bat
:: dpct -- processing etc\dpct\vars.bat
:: dpl -- processing etc\dpl\vars.bat
:: ipp -- processing etc\ipp\vars.bat
:: ippcp -- processing etc\ippcp\vars.bat
:: mkl -- processing etc\mkl\vars.bat
:: tbb -- processing etc\tbb\vars.bat
:: vtune -- processing etc\vtune\vars.bat
:: oneAPI environment initialized ::
Windows PowerShell
Copyright (C) Microsoft Corporation. All rights reserved.

Install the latest PowerShell for new features and improvements! https://aka.ms/PSWindows

PS C:\Windows\System32> $env:FORCE_CMAKE=1
PS C:\Windows\System32> $env:CMAKE_GENERATOR = "MinGW Makefiles"
PS C:\Windows\System32> $env:CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx"
PS C:\Windows\System32> $env:CMAKE_GENERATOR = "MinGW Makefiles"
PS C:\Windows\System32> pip install llama-cpp-python[server] --upgrade --force-reinstall
Collecting llama-cpp-python[server]
  Downloading llama_cpp_python-0.2.44.tar.gz (36.6 MB)
     ---------------------------------------- 36.6/36.6 MB 2.4 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting typing-extensions>=4.5.0 (from llama-cpp-python[server])
  Downloading typing_extensions-4.9.0-py3-none-any.whl.metadata (3.0 kB)
Collecting numpy>=1.20.0 (from llama-cpp-python[server])
  Downloading numpy-1.26.4-cp311-cp311-win_amd64.whl.metadata (61 kB)
     ---------------------------------------- 61.0/61.0 kB 406.4 kB/s eta 0:00:00
Collecting diskcache>=5.6.1 (from llama-cpp-python[server])
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Collecting jinja2>=2.11.3 (from llama-cpp-python[server])
  Downloading Jinja2-3.1.3-py3-none-any.whl.metadata (3.3 kB)
Collecting uvicorn>=0.22.0 (from llama-cpp-python[server])
  Downloading uvicorn-0.27.1-py3-none-any.whl.metadata (6.3 kB)
Collecting fastapi>=0.100.0 (from llama-cpp-python[server])
  Downloading fastapi-0.109.2-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings>=2.0.1 (from llama-cpp-python[server])
  Downloading pydantic_settings-2.2.0-py3-none-any.whl.metadata (3.1 kB)
Collecting sse-starlette>=1.6.1 (from llama-cpp-python[server])
  Downloading sse_starlette-2.0.0-py3-none-any.whl.metadata (5.4 kB)
Collecting starlette-context<0.4,>=0.3.6 (from llama-cpp-python[server])
  Downloading starlette_context-0.3.6-py3-none-any.whl.metadata (4.3 kB)
Collecting pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 (from fastapi>=0.100.0->llama-cpp-python[server])
  Downloading pydantic-2.6.1-py3-none-any.whl.metadata (83 kB)
     ---------------------------------------- 83.5/83.5 kB 1.6 MB/s eta 0:00:00
Collecting starlette<0.37.0,>=0.36.3 (from fastapi>=0.100.0->llama-cpp-python[server])
  Downloading starlette-0.36.3-py3-none-any.whl.metadata (5.9 kB)
Collecting MarkupSafe>=2.0 (from jinja2>=2.11.3->llama-cpp-python[server])
  Downloading MarkupSafe-2.1.5-cp311-cp311-win_amd64.whl.metadata (3.1 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings>=2.0.1->llama-cpp-python[server])
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting anyio (from sse-starlette>=1.6.1->llama-cpp-python[server])
  Downloading anyio-4.2.0-py3-none-any.whl.metadata (4.6 kB)
Collecting click>=7.0 (from uvicorn>=0.22.0->llama-cpp-python[server])
  Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting h11>=0.8 (from uvicorn>=0.22.0->llama-cpp-python[server])
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
     ---------------------------------------- 58.3/58.3 kB 1.0 MB/s eta 0:00:00
Collecting colorama (from click>=7.0->uvicorn>=0.22.0->llama-cpp-python[server])
  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Collecting annotated-types>=0.4.0 (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi>=0.100.0->llama-cpp-python[server])
  Downloading annotated_types-0.6.0-py3-none-any.whl.metadata (12 kB)
Collecting pydantic-core==2.16.2 (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi>=0.100.0->llama-cpp-python[server])
  Downloading pydantic_core-2.16.2-cp311-none-win_amd64.whl.metadata (6.6 kB)
Collecting idna>=2.8 (from anyio->sse-starlette>=1.6.1->llama-cpp-python[server])
  Downloading idna-3.6-py3-none-any.whl.metadata (9.9 kB)
Collecting sniffio>=1.1 (from anyio->sse-starlette>=1.6.1->llama-cpp-python[server])
  Downloading sniffio-1.3.0-py3-none-any.whl (10 kB)
Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
   ---------------------------------------- 45.5/45.5 kB 2.2 MB/s eta 0:00:00
Downloading fastapi-0.109.2-py3-none-any.whl (92 kB)
   ---------------------------------------- 92.1/92.1 kB 2.6 MB/s eta 0:00:00
Downloading Jinja2-3.1.3-py3-none-any.whl (133 kB)
   ---------------------------------------- 133.2/133.2 kB 2.6 MB/s eta 0:00:00
Downloading numpy-1.26.4-cp311-cp311-win_amd64.whl (15.8 MB)
   ---------------------------------------- 15.8/15.8 MB 1.2 MB/s eta 0:00:00
Downloading pydantic_settings-2.2.0-py3-none-any.whl (13 kB)
Downloading sse_starlette-2.0.0-py3-none-any.whl (9.0 kB)
Downloading starlette_context-0.3.6-py3-none-any.whl (12 kB)
Downloading typing_extensions-4.9.0-py3-none-any.whl (32 kB)
Downloading uvicorn-0.27.1-py3-none-any.whl (60 kB)
   ---------------------------------------- 60.8/60.8 kB 1.6 MB/s eta 0:00:00
Downloading click-8.1.7-py3-none-any.whl (97 kB)
   ---------------------------------------- 97.9/97.9 kB 1.1 MB/s eta 0:00:00
Downloading MarkupSafe-2.1.5-cp311-cp311-win_amd64.whl (17 kB)
Downloading pydantic-2.6.1-py3-none-any.whl (394 kB)
   ---------------------------------------- 394.8/394.8 kB 1.2 MB/s eta 0:00:00
Downloading pydantic_core-2.16.2-cp311-none-win_amd64.whl (1.9 MB)
   ---------------------------------------- 1.9/1.9 MB 1.3 MB/s eta 0:00:00
Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Downloading starlette-0.36.3-py3-none-any.whl (71 kB)
   ---------------------------------------- 71.5/71.5 kB 2.0 MB/s eta 0:00:00
Downloading anyio-4.2.0-py3-none-any.whl (85 kB)
   ---------------------------------------- 85.5/85.5 kB 1.6 MB/s eta 0:00:00
Downloading annotated_types-0.6.0-py3-none-any.whl (12 kB)
Downloading idna-3.6-py3-none-any.whl (61 kB)
   ---------------------------------------- 61.6/61.6 kB 1.7 MB/s eta 0:00:00
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... done
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.44-cp311-cp311-win_amd64.whl size=4878681 sha256=a993849adae7a7679f80e62d42df7db07bc5e5496ceffb28525b6f6cf09aa521
  Stored in directory: c:\users\bedirhan\appdata\local\pip\cache\wheels\fa\7e\9a\6a4e5377e7df680b778505efb19cbc24d5343c5612589bdce3
Successfully built llama-cpp-python
Installing collected packages: typing-extensions, sniffio, python-dotenv, numpy, MarkupSafe, idna, h11, diskcache, colorama, annotated-types, pydantic-core, jinja2, click, anyio, uvicorn, starlette, pydantic, llama-cpp-python, starlette-context, sse-starlette, pydantic-settings, fastapi
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.9.0
    Uninstalling typing_extensions-4.9.0:
      Successfully uninstalled typing_extensions-4.9.0
  Attempting uninstall: sniffio
    Found existing installation: sniffio 1.3.0
    Uninstalling sniffio-1.3.0:
      Successfully uninstalled sniffio-1.3.0
  Attempting uninstall: python-dotenv
    Found existing installation: python-dotenv 1.0.1
    Uninstalling python-dotenv-1.0.1:
      Successfully uninstalled python-dotenv-1.0.1
  Attempting uninstall: numpy
    Found existing installation: numpy 1.26.4
    Uninstalling numpy-1.26.4:
      Successfully uninstalled numpy-1.26.4
  Attempting uninstall: MarkupSafe
    Found existing installation: MarkupSafe 2.1.5
    Uninstalling MarkupSafe-2.1.5:
      Successfully uninstalled MarkupSafe-2.1.5
  Attempting uninstall: idna
    Found existing installation: idna 3.6
    Uninstalling idna-3.6:
      Successfully uninstalled idna-3.6
  Attempting uninstall: h11
    Found existing installation: h11 0.14.0
    Uninstalling h11-0.14.0:
      Successfully uninstalled h11-0.14.0
  Attempting uninstall: diskcache
    Found existing installation: diskcache 5.6.3
    Uninstalling diskcache-5.6.3:
      Successfully uninstalled diskcache-5.6.3
  Attempting uninstall: colorama
    Found existing installation: colorama 0.4.6
    Uninstalling colorama-0.4.6:
      Successfully uninstalled colorama-0.4.6
  Attempting uninstall: annotated-types
    Found existing installation: annotated-types 0.6.0
    Uninstalling annotated-types-0.6.0:
      Successfully uninstalled annotated-types-0.6.0
  Attempting uninstall: pydantic-core
    Found existing installation: pydantic_core 2.16.2
    Uninstalling pydantic_core-2.16.2:
      Successfully uninstalled pydantic_core-2.16.2
  Attempting uninstall: jinja2
    Found existing installation: Jinja2 3.1.3
    Uninstalling Jinja2-3.1.3:
      Successfully uninstalled Jinja2-3.1.3
  Attempting uninstall: click
    Found existing installation: click 8.1.7
    Uninstalling click-8.1.7:
      Successfully uninstalled click-8.1.7
  Attempting uninstall: anyio
    Found existing installation: anyio 4.2.0
    Uninstalling anyio-4.2.0:
      Successfully uninstalled anyio-4.2.0
  Attempting uninstall: uvicorn
    Found existing installation: uvicorn 0.27.1
    Uninstalling uvicorn-0.27.1:
      Successfully uninstalled uvicorn-0.27.1
  Attempting uninstall: starlette
    Found existing installation: starlette 0.36.3
    Uninstalling starlette-0.36.3:
      Successfully uninstalled starlette-0.36.3
  Attempting uninstall: pydantic
    Found existing installation: pydantic 2.6.1
    Uninstalling pydantic-2.6.1:
      Successfully uninstalled pydantic-2.6.1
  Attempting uninstall: starlette-context
    Found existing installation: starlette-context 0.3.6
    Uninstalling starlette-context-0.3.6:
      Successfully uninstalled starlette-context-0.3.6
  Attempting uninstall: sse-starlette
    Found existing installation: sse-starlette 2.0.0
    Uninstalling sse-starlette-2.0.0:
      Successfully uninstalled sse-starlette-2.0.0
  Attempting uninstall: pydantic-settings
    Found existing installation: pydantic-settings 2.2.0
    Uninstalling pydantic-settings-2.2.0:
      Successfully uninstalled pydantic-settings-2.2.0
  Attempting uninstall: fastapi
    Found existing installation: fastapi 0.109.2
    Uninstalling fastapi-0.109.2:
      Successfully uninstalled fastapi-0.109.2
Successfully installed MarkupSafe-2.1.5 annotated-types-0.6.0 anyio-4.2.0 click-8.1.7 colorama-0.4.6 diskcache-5.6.3 fastapi-0.109.2 h11-0.14.0 idna-3.6 jinja2-3.1.3 llama-cpp-python-0.2.44 numpy-1.26.4 pydantic-2.6.1 pydantic-core-2.16.2 pydantic-settings-2.2.0 python-dotenv-1.0.1 sniffio-1.3.0 sse-starlette-2.0.0 starlette-0.36.3 starlette-context-0.3.6 typing-extensions-4.9.0 uvicorn-0.27.1
PS C:\Windows\System32> python -m llama_cpp.server --model C:\Users\Bedirhan\Desktop\AI-Research\models\mistral-7b-instruct-v0.2.Q3_K_L.gguf --model_alias translator --chat_format chatml --n_gpu_layers -1
GGML_SYCL_DEBUG=0
ggml_init_sycl: GGML_SYCL_F16:   no
ggml_init_sycl: SYCL_USE_XMX: yes
found 4 SYCL devices:
  Device 0: Intel(R) Arc(TM) A750 Graphics,     compute capability 1.3,
        max compute_units 448,  max work group size 1024,       max sub group size 32,  global mem size 4024516608
  Device 1: Intel(R) Arc(TM) A750 Graphics,     compute capability 3.0,
        max compute_units 448,  max work group size 1024,       max sub group size 32,  global mem size 4024516608
  Device 2: AMD Ryzen 5 2600 Six-Core Processor            ,    compute capability 3.0,
        max compute_units 12,   max work group size 8192,       max sub group size 64,  global mem size 4202942464
  Device 3: Intel(R) FPGA Emulation Device,     compute capability 1.2,
        max compute_units 12,   max work group size 67108864,   max sub group size 64,  global mem size 4202942464
Using device 0 (Intel(R) Arc(TM) A750 Graphics) as main device
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from C:\Users\Bedirhan\Desktop\AI-Research\models\mistral-7b-instruct-v0.2.Q3_K_L.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.2
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 13
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  20:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  21:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  22:                    tokenizer.chat_template str              = {{ bos_token }}{% for message in mess...
llama_model_loader: - kv  23:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q3_K:  129 tensors
llama_model_loader: - type q5_K:   96 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q3_K - Large
llm_load_print_meta: model params     = 7.24 B
llm_load_print_meta: model size       = 3.56 GiB (4.22 BPW)
llm_load_print_meta: general.name     = mistralai_mistral-7b-instruct-v0.2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.22 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:            buffer size =  3590.55 MiB
llm_load_tensors:        CPU buffer size =    53.71 MiB
.................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:            KV buffer size =   256.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:        CPU input buffer size   =    13.01 MiB
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\Bedirhan\AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_cpp\server\__main__.py", line 88, in <module>
    main()
  File "C:\Users\Bedirhan\AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_cpp\server\__main__.py", line 74, in main
    app = create_app(
          ^^^^^^^^^^^
  File "C:\Users\Bedirhan\AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_cpp\server\app.py", line 133, in create_app
    set_llama_proxy(model_settings=model_settings)
  File "C:\Users\Bedirhan\AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_cpp\server\app.py", line 70, in set_llama_proxy
    _llama_proxy = LlamaProxy(models=model_settings)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bedirhan\AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_cpp\server\model.py", line 31, in __init__
    self._current_model = self.load_llama_from_model_settings(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bedirhan\AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_cpp\server\model.py", line 124, in load_llama_from_model_settings
    _model = llama_cpp.Llama(
             ^^^^^^^^^^^^^^^^
  File "C:\Users\Bedirhan\AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_cpp\llama.py", line 314, in __init__
    self._ctx = _LlamaContext(
                ^^^^^^^^^^^^^^
  File "C:\Users\Bedirhan\AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_cpp\_internals.py", line 257, in __init__
    self.ctx = llama_cpp.llama_new_context_with_model(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bedirhan\AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_cpp\llama_cpp.py", line 750, in llama_new_context_with_model
    return _lib.llama_new_context_with_model(model, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: exception: access violation reading 0x0000000000000020
PS C:\Windows\System32>

abetlen / llama-cpp-python

SYCL Installation OSError: exception Error #1197

Prerequisites

Expected Behavior

Environment and Context

Failure Logs