Bug: Vulkan backend crash on model loading

LSXAxeller commented 2 months ago

What happened?

I mainly use LLamaSharp C# bindings, after updating to v0.14.0 and releasing Vulkan backend, I decided to give it a try instead using CPU inference, but on loading model it crash with console output

WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
WARNING: [Loader Message] Code 0 : Layer VK_LAYER_RENDERDOC_Capture uses API version 1.2 which is older than the application specified API version of 1.3. May cause issues.
llama_model_loader: loaded meta data with 25 key-value pairs and 327 tensors from C:\Models\Text\Index-1.9B-Character\Index-1.9B-Character-Q6_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Index-1.9B-Character_test
llama_model_loader: - kv   2:                          llama.block_count u32              = 36
llama_model_loader: - kv   3:                       llama.context_length u32              = 4096
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5888
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 16
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 16
llama_model_loader: - kv   8:     llama.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv   9:                          general.file_type u32              = 18
llama_model_loader: - kv  10:                           llama.vocab_size u32              = 65029
llama_model_loader: - kv  11:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  12:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,65029]   = ["<unk>", "<s>", "</s>", "reserved_0"...
llama_model_loader: - kv  16:                      tokenizer.ggml.scores arr[f32,65029]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,65029]   = [2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  22:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  23:                    tokenizer.chat_template str              = {% if messages[0]['role'] == 'system'...
llama_model_loader: - kv  24:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   73 tensors
llama_model_loader: - type q6_K:  254 tensors
llm_load_vocab: special tokens cache size = 515
llm_load_vocab: token to piece cache size = 0.3670 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 65029
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_layer          = 36
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 16
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 2048
llm_load_print_meta: n_embd_v_gqa     = 2048
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 5888
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q6_K
llm_load_print_meta: model params     = 2.17 B
llm_load_print_meta: model size       = 1.66 GiB (6.56 BPW) 
llm_load_print_meta: general.name     = Index-1.9B-Character_test
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 0 '<unk>'
llm_load_print_meta: LF token         = 270 '<0x0A>'
llm_load_print_meta: max token length = 48
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: Radeon RX 580 Series (AMD proprietary driver) | uma: 0 | fp16: 0 | warp size: 64
Fatal error: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

Repeat 2 times:
--------------------------------
at LLama.Native.SafeLlamaModelHandle.llama_load_model_from_file(System.String, LLama.Native.LLamaModelParams)
--------------------------------
at LLama.Native.SafeLlamaModelHandle.LoadFromFile(System.String, LLama.Native.LLamaModelParams)
at LLama.LlamaWeights+<>c__DisplayClass21_0.<LoadFromFileAsync>b__1()
at System.Threading.Tasks.Task`1[[System.__Canon, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].InnerInvoke()
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()

I decided to give llama.cpp released binaries a try, firstly I tried using release b3375 which is the base for LLamaSharp then the latest release b3504, tried both versions with both AVX2 and Vulkan binaries, but result was same like LLamaSharp on Vulkan with console output

C:\External\llama-b3504-bin-win-vulkan-x64>llama-cli --model Index-1.9B-Character-Q6_K.gguf --n-gpu-layers 8 -cnv
Log start
main: build = 3504 (e09a800f)
main: built with MSVC 19.29.30154.0 for x64
main: seed  = 1722604647
llama_model_loader: loaded meta data with 25 key-value pairs and 327 tensors from Index-1.9B-Character-Q6_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Index-1.9B-Character_test
llama_model_loader: - kv   2:                          llama.block_count u32              = 36
llama_model_loader: - kv   3:                       llama.context_length u32              = 4096
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5888
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 16
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 16
llama_model_loader: - kv   8:     llama.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv   9:                          general.file_type u32              = 18
llama_model_loader: - kv  10:                           llama.vocab_size u32              = 65029
llama_model_loader: - kv  11:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  12:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,65029]   = ["<unk>", "<s>", "</s>", "reserved_0"...
llama_model_loader: - kv  16:                      tokenizer.ggml.scores arr[f32,65029]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,65029]   = [2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  22:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  23:                    tokenizer.chat_template str              = {% if messages[0]['role'] == 'system'...
llama_model_loader: - kv  24:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   73 tensors
llama_model_loader: - type q6_K:  254 tensors
llm_load_vocab: special tokens cache size = 259
llm_load_vocab: token to piece cache size = 0.3670 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 65029
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_layer          = 36
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 16
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 2048
llm_load_print_meta: n_embd_v_gqa     = 2048
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 5888
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q6_K
llm_load_print_meta: model params     = 2.17 B
llm_load_print_meta: model size       = 1.66 GiB (6.56 BPW)
llm_load_print_meta: general.name     = Index-1.9B-Character_test
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 0 '<unk>'
llm_load_print_meta: LF token         = 270 '<0x0A>'
llm_load_print_meta: max token length = 48
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: Radeon RX 580 Series (AMD proprietary driver) | uma: 0 | fp16: 0 | warp size: 64

I tried with different models, 1.9B, 1.1B, 300M, 22M and different --n-gpu-layers like 1, 0, 8, 16, 36 on RX 580 4GB GPU but utilization still 4% like idle and vram is empty

Name and Version

llama-cli --version: 3504 (e09a800f) built with MSVC 19.29.30154.0 for x64 llama-cli --version: 3375 (36864569) built with MSVC 19.29.30154.0 for x64

What operating system are you seeing the problem on?

No response

Relevant log output

No response

0cc4m commented 2 months ago

It's impossible to tell what's going on here without a debugger. You'd have to build the llama-cli executable with debug flags and run it through a debugger, to find where it segfaults.

LSXAxeller commented 1 month ago

It's impossible to tell what's going on here without a debugger. You'd have to build the llama-cli executable with debug flags and run it through a debugger, to find where it segfaults.

any tips on how to use it with debugger ? I built the latest commit b3580 with vulkan and debug flags using make LLAMA_DEBUG=1 GGML_VULKAN=1 -j 6 but still it doesn't print anything new, I am not relative with C++ so I don't know which tool to use or how.

Microsoft Windows [Version 10.0.22631.3958]
(c) Microsoft Corporation. All rights reserved.

C:\External\X\w64devkit>w64devkit.exe
~ $ SDK_VERSION=1.3.290.0
~ $ cp "C:/Program Files/VulkanSDK/$SDK_VERSION/Bin/glslc.exe" $W64DEVKIT_HOME/bin/
~ $ cp "C:/Program Files/VulkanSDK/$SDK_VERSION/Lib/vulkan-1.lib" $W64DEVKIT_HOME/x86_64-w64-mingw32/lib/
~ $ cp -r "C:/Program Files/VulkanSDK/$SDK_VERSION/Include/*" $W64DEVKIT_HOME/x86_64-w64-mingw32/include/
cp: can't stat 'C:/Program Files/VulkanSDK/1.3.290.0/Include/*': No such file or directory
~ $ cp -r "C:/Program Files/VulkanSDK/$SDK_VERSION/Include/"* $W64DEVKIT_HOME/x86_64-w64-mingw32/include/
~ $ cat > $W64DEVKIT_HOME/x86_64-w64-mingw32/lib/pkgconfig/vulkan.pc <<EOF
> Name: Vulkan-Loader
> Description: Vulkan Loader
> Version: $SDK_VERSION
> Libs: -lvulkan-1
> EOF
~ $ cd "C:\External\X\llama.cpp"
C:/External/X/llama.cpp $ make LLAMA_DEBUG=1 GGML_VULKAN=1 -j 6
I ccache not found. Consider installing it for faster compilation.
I llama.cpp build info:
I UNAME_S:   Windows_NT
I UNAME_P:   unknown
I UNAME_M:   x86_64
I CFLAGS:    -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -std=c11   -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion
I CXXFLAGS:  -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN
I NVCCFLAGS: -std=c++11 -O0 -g
I LDFLAGS:   -g -lvulkan-1
I CC:        cc (GCC) 14.2.0
I CXX:       c++ (GCC) 14.2.0

c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c ggml/src/llamafile/sgemm.cpp -o ggml/src/llamafile/sgemm.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -o vulkan-shaders-gen -g -lvulkan-1  ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp
cc  -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -std=c11   -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion    -c ggml/src/ggml.c -o ggml/src/ggml.o
cc  -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -std=c11   -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion    -c ggml/src/ggml-alloc.c -o ggml/src/ggml-alloc.o
cc  -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -std=c11   -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion    -c ggml/src/ggml-backend.c -o ggml/src/ggml-backend.o
cc -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -std=c11   -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion     -c ggml/src/ggml-quants.c -o ggml/src/ggml-quants.o
ggml/src/ggml.c:90:8: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
   90 | static atomic_bool atomic_flag_test_and_set(atomic_flag * ptr) {
      |        ^~~~~~~~~~~
cc -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -std=c11   -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion     -c ggml/src/ggml-aarch64.c -o ggml/src/ggml-aarch64.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c src/llama.cpp -o src/llama.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c src/llama-vocab.cpp -o src/llama-vocab.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c src/llama-grammar.cpp -o src/llama-grammar.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c src/llama-sampling.cpp -o src/llama-sampling.o
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:60:6: warning: no previous declaration for 'void execute_command(const std::string&, std::string&, std::string&)' [-Wmissing-declarations]
   60 | void execute_command(const std::string& command, std::string& stdout_str, std::string& stderr_str) {
      |      ^~~~~~~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp: In function 'void execute_command(const std::string&, std::string&, std::string&)':
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::lpReserve ' [-Wmissing-field-initializers]
   77 |     STARTUPINFOA si = { sizeof(STARTUPINFOA) };
      |                                              ^
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::lpDesktop' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::lpTitle'  -Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::dwX' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::dwY' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::dwXSize'  -Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::dwYSize'  -Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::dwXCountChars' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::dwYCountChars' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::dwFillAttribute' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::dwFlags'  -Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::wShowWindow' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::cbReserved2' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::lpReserved2' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::hStdInput' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::hStdOutpu ' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::hStdError' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp: At global scope:
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:152:6: warning: no previous declaration for 'bool directory_exists(const std::string&)' [-Wmissing-declarations]
  152 | bool directory_exists(const std::string& path) {
      |      ^~~~~~~~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:160:6: warning: no previous declaration for 'bool create_directory(const std::string&)' [-Wmissing-declarations]
  160 | bool create_directory(const std::string& path) {
      |      ^~~~~~~~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:168:13: warning: no previous declaration for 'std::string to_uppercase(const std::string&)' [-Wmissing-declarations]
  168 | std::string to_uppercase(const std::string& input) {
      |             ^~~~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:176:6: warning: no previous declaration for 'bool string_ends_with(const std::string&, const std::string&)' [-Wmissing-declarations]
  176 | bool string_ends_with(const std::string& str, const std::string& suffix) {
      |      ^~~~~~~~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:185:13: warning: no previous declaration for 'std::string join_paths(const std::string&, const std::string&)' [-Wmissing-declarations]
  185 | std::string join_paths(const std::string& path1, const std::string& path2) {
      |             ^~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:189:13: warning: no previous declaration for 'std::string basename(const std::string&)' [-Wmissing-declarations]
  189 | std::string basename(const std::string &path) {
      |             ^~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:193:6: warning: no previous declaration for 'void string_to_spv(const std::string&, const std::string&, const std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >&, bool)' [-Wmissing-declarations]
  193 | void string_to_spv(const std::string& _name, const std::string& in_fname, const std::map<std::string, std::string>& defines, bool fp16 = true) {
      |      ^~~~~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:233:36: warning: no previous declaration for 'std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> > merge_maps(const std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >&, const std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >&)' [-Wmissing-declarations]
  233 | std::map<std::string, std::string> merge_maps(const std::map<std::string, std::string>& a, const std::map<std::string, std::string>& b) {
      |                                    ^~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:239:6: warning: no previous declaration for 'void matmul_shaders(std::vector<std::future<void> >&, bool, bool)' [-Wmissing-declarations]
  239 | void matmul_shaders(std::vector<std::future<void>>& tasks, bool fp16, bool matmul_id) {
      |      ^~~~~~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:286:6: warning: no previous declaration for 'void process_shaders(std::vector<std::future<void> >&)' [-Wmissing-declarations]
  286 | void process_shaders(std::vector<std::future<void>>& tasks) {
      |      ^~~~~~~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:477:6: warning: no previous declaration for 'void write_output_files()'  -Wmissing-declarations]
  477 | void write_output_files() {
      |      ^~~~~~~~~~~~~~~~~~
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c src/unicode.cpp -o src/unicode.o
src/llama.cpp: In member function 'std::string llama_file::GetErrorMessageWin32(DWORD) const':
src/llama.cpp:1480:46: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'DWORD' {aka 'long unsigned int'} [-Wformat=]
 1480 |             ret = format("Win32 error code: %s", error_code);
      |                                             ~^   ~~~~~~~~~~
      |                                              |   |
      |                                              |   DWORD {aka long unsigned int}
      |                                              char*
      |                                             %ld
src/llama.cpp: In constructor 'llama_mmap::llama_mmap(llama_file*, size_t, bool)':
src/llama.cpp:1818:38: warning: cast between incompatible function types from 'FARPROC' {aka 'long long int (*)()'} to  BOOL (*)(HANDLE, ULONG_PTR, PWIN32_MEMORY_RANGE_ENTRY, ULONG)' {aka 'int (*)(void*, long long unsigned int, _WIN32_MEMORY_RANGE_ENTRY*, long unsigned int)'} [-Wcast-function-type]
 1818 |             pPrefetchVirtualMemory = reinterpret_cast<decltype(pPrefetchVirtualMemory)> (GetProcAddress(hKernel32, "PrefetchVirtualMemory"));
      |                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c src/unicode-data.cpp -o src/unicode-data.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c common/common.cpp -o common/common.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c common/console.cpp -o common/console.o
In file included from src/llama.cpp:1:
src/llama.cpp: In function 'void llama_lora_adapter_init_internal(llama_model*, const char*, llama_lora_adapter&)':
src/llama.cpp:16360:20: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'std::unordered_map<std::__cxx11::basic_string<char>, llama_lora_weight>::size_type' {aka 'long long unsigned int'} [-Wformat=]
16360 |     LLAMA_LOG_INFO("%s: loaded %ld tensors from lora file\n", __func__, adapter.ab_map.size()*2);
      |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~            ~~~~~~~~~~~~~~~~~~~~~~~
      |                                                                                              |
      |                                                                                              std::unordered_map<std::__cxx11::basic_string<char>, llama_lora_weight>::size_type {aka long long unsigned int}
src/llama-impl.h:24:71: note: in definition of macro 'LLAMA_LOG_INFO'
   24 | #define LLAMA_LOG_INFO(...)  llama_log_internal(GGML_LOG_LEVEL_INFO , __VA_ARGS__)
      |                                                                       ^~~~~~~~~~~
src/llama.cpp:16360:34: note: format string is defined here
16360 |     LLAMA_LOG_INFO("%s: loaded %ld tensors from lora file\n", __func__, adapter.ab_map.size()*2);
      |                                ~~^
      |                                  |
      |                                  long int
      |                                %lld
src/llama.cpp: In function 'float* llama_get_logits_ith(llama_context*, int32_t)':
src/llama.cpp:18575:65: warning: format '%lu' expects argument of type 'long unsigned int', but argument 2 has type 'std::vector<int>::size_type' {aka 'long long unsigned int'} [-Wformat=]
18575 |             throw std::runtime_error(format("out of range [0, %lu)", ctx->output_ids.size()));
      |                                                               ~~^    ~~~~~~~~~~~~~~~~~~~~~~
      |                                                                 |                        |
      |                                                                 long unsigned int        std::vector<int>::size_type {aka long long unsigned int}
      |                                                               %llu
src/llama.cpp: In function 'float* llama_get_embeddings_ith(llama_context*, int32_t)':
src/llama.cpp:18620:65: warning: format '%lu' expects argument of type 'long unsigned int', but argument 2 has type 'std::vector<int>::size_type' {aka 'long long unsigned int'} [-Wformat=]
18620 |             throw std::runtime_error(format("out of range [0, %lu)", ctx->output_ids.size()));
      |                                                               ~~^    ~~~~~~~~~~~~~~~~~~~~~~
      |                                                                 |                        |
      |                                                                 long unsigned int        std::vector<int>::size_type {aka long long unsigned int}
      |                                                               %llu
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c common/ngram-cache.cpp -o common/ngram-cache.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c common/sampling.cpp -o common/sampling.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c common/train.cpp -o common/train.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c common/grammar-parser.cpp -o common/grammar-parser.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c common/json-schema-to-grammar.cpp -o common/json-schema-to-grammar.o
cc -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -std=c11   -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion  -Iexamples/gguf-hash/deps -c examples/gguf-hash/deps/sha1/sha1.c -o examples/gguf-hash/deps/sha1/sha1.o
cc -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -std=c11   -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion  -Iexamples/gguf-hash/deps -c examples/gguf-hash/deps/xxhash/xxhash.c -o examples/gguf-hash/deps/xxhash/xxhash.o
cc -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -std=c11   -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion  -Iexamples/gguf-hash/deps -c examples/gguf-hash/deps/sha256/sha256.c -o examples/gguf-hash/deps/sha256/sha256.o
cc -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -std=c11   -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion  -c tests/test-c.c -o tests/test-c.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/deprecation-warning/deprecation-warning.cpp -o examples/deprecation-warning/deprecation-warning.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c common/build-info.cpp -o common/build-info.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  examples/deprecation-warning/deprecation-warning.o -o main -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  examples/deprecation-warning/deprecation-warning.o -o server -g -lvulkan-1
NOTICE: The 'main' binary is deprecated. Please use 'llama-cli' instead.
NOTICE: The 'server' binary is deprecated. Please use 'llama-server' instead.
C:/External/X/llama.cpp/vulkan-shaders-gen \
        --glslc      glslc \
        --input-dir  ggml/src/vulkan-shaders \
        --target-hpp ggml/src/ggml-vulkan-shaders.hpp \
        --target-cpp ggml/src/ggml-vulkan-shaders.cpp
ggml_vulkan: Generating and compiling shaders to SPIR-V
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN   -c ggml/src/ggml-vulkan.cpp -o ggml/src/ggml-vulkan.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN   -c -o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml-vulkan-shaders.cpp
ggml/src/ggml-vulkan.cpp: In function 'int ggml_backend_vk_reg_devices()':
ggml/src/ggml-vulkan.cpp:6789:43: warning: format '%ld' expects argument of type 'long int', but argument 5 has type 'size_t' {aka 'long long unsigned int'} [-Wformat=]
 6789 |         snprintf(name, sizeof(name), "%s%ld", GGML_VK_NAME, i);
      |                                         ~~^                 ~
      |                                           |                 |
      |                                           long int          size_t {aka long long unsigned int}
      |                                         %lld
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -static -fPIC -c examples/llava/llava.cpp -o libllava.a -Wno-cast-qual
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/baby-llama/baby-llama.cpp -o examples/baby-llama/baby-llama.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/batched/batched.cpp -o examples/batched/batched.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/batched-bench/batched-bench.cpp -o examples/batched-bench/batched-bench.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/llama-bench/llama-bench.cpp -o examples/llama-bench/llama-bench.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/benchmark/benchmark-matmult.cpp -o examples/benchmark/benchmark-matmult.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/baby-llama/baby-llama.o -o llama-baby-llama -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o common/build-info.o examples/benchmark/benchmark-matmult.o -o llama-benchmark-matmult -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/main/main.cpp -o examples/main/main.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/batched-bench/batched-bench.o -o llama-batched-bench -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/batched/batched.o -o llama-batched -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp -o examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.o
examples/llama-bench/llama-bench.cpp: In constructor 'test::test(const cmd_params_instance&, const llama_model*, const llama_context*)':
examples/llama-bench/llama-bench.cpp:813:43: warning: unknown conversion type character 'F' in format [-Wformat=]
  813 |         std::strftime(buf, sizeof(buf), "%FT%TZ", gmtime(&t));
      |                                           ^
examples/llama-bench/llama-bench.cpp:813:46: warning: unknown conversion type character 'T' in format [-Wformat=]
  813 |         std::strftime(buf, sizeof(buf), "%FT%TZ", gmtime(&t));
      |                                              ^
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/embedding/embedding.cpp -o examples/embedding/embedding.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/eval-callback/eval-callback.cpp -o examples/eval-callback/eval-callback.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/export-lora/export-lora.cpp -o examples/export-lora/export-lora.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.o -o llama-convert-llama2c-to-ggml -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/embedding/embedding.o -o llama-embedding -g -lvulkan-1
examples/export-lora/export-lora.cpp: In member function 'void lora_merge_ctx::run_merge()':
examples/export-lora/export-lora.cpp:267:31: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t' {aka 'long long unsigned int'} [-Wformat=]
  267 |         printf("%s : merged %ld tensors with lora adapters\n", __func__, n_merged);
      |                             ~~^                                          ~~~~~~~~
      |                               |                                          |
      |                               long int                                   size_t {aka long long unsigned int}
      |                             %lld
examples/export-lora/export-lora.cpp:268:30: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'std::vector<tensor_transformation>::size_type' {aka 'long long unsigned int'} [-Wformat=]
  268 |         printf("%s : wrote %ld tensors to output file\n", __func__, trans.size());
      |                            ~~^                                      ~~~~~~~~~~~~
      |                              |                                                |
      |                              long int                                         std::vector<tensor_transformation>::size_type {aka long long unsigned int}
      |                            %lld
examples/export-lora/export-lora.cpp: In member function 'void lora_merge_ctx::merge_tensor(ggml_tensor*, ggml_tensor*)':
examples/export-lora/export-lora.cpp:354:57: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t' {aka 'long long unsigned int'} [-Wformat=]
  354 |                 printf("%s :   + merging from adapter[%ld] type=%s\n", __func__, i, ggml_type_name(inp_a[i]->type));
      |                                                       ~~^                        ~
      |                                                         |                        |
      |                                                         long int                 size_t {aka long long unsigned int}
      |                                                       %lld
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/main/main.o -o llama-cli -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/eval-callback/eval-callback.o -o llama-eval-callback -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/gbnf-validator/gbnf-validator.cpp -o examples/gbnf-validator/gbnf-validator.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/export-lora/export-lora.o -o llama-export-lora -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/gguf/gguf.cpp -o examples/gguf/gguf.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/gbnf-validator/gbnf-validator.o -o llama-gbnf-validator -g -lvulkan-1

====  Run ./llama-cli -h for help.  ====

c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -Iexamples/gguf-hash/deps -c examples/gguf-hash/gguf-hash.cpp -o examples/gguf-hash/gguf-hash.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/gguf-split/gguf-split.cpp -o examples/gguf-split/gguf-split.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o examples/gguf/gguf.o -o llama-gguf -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/gritlm/gritlm.cpp -o examples/gritlm/gritlm.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  examples/gguf-hash/deps/sha1/sha1.o examples/gguf-hash/deps/xxhash/xxhash.o examples/gguf-hash/deps/sha256/sha256.o ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/gguf-hash/gguf-hash.o -o llama-gguf-hash -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/imatrix/imatrix.cpp -o examples/imatrix/imatrix.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/infill/infill.cpp -o examples/infill/infill.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/llama-bench/llama-bench.o -o llama-bench -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/gritlm/gritlm.o -o llama-gritlm -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  examples/llava/llava-cli.cpp examples/llava/llava.cpp examples/llava/clip.cpp ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o -o llama-llava-cli -g -lvulkan-1  -Wno-cast-qual
examples/gguf-split/gguf-split.cpp: In member function 'void split_strategy::print_info()':
examples/gguf-split/gguf-split.cpp:278:28: warning: format '%ld' expects argument of type 'long int', but argument 2 has type 'std::vector<gguf_context*>::size_type' {aka 'long long unsigned int'} [-Wformat=]
  278 |         printf("n_split: %ld\n", ctx_outs.size());
      |                          ~~^     ~~~~~~~~~~~~~~~
      |                            |                  |
      |                            long int           std::vector<gguf_context*>::size_type {aka long long unsigned int
      |                          %lld
examples/gguf-split/gguf-split.cpp:288:64: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'size_t' {aka 'long long unsigned int'} [-Wformat=]
  288 |             printf("split %05d: n_tensors = %d, total_size = %ldM\n", i_split + 1, gguf_get_n_tensors(ctx_out), total_size);
      |                                                              ~~^
~~~~~~~~~~
      |                                                                |
|
      |                                                                long int
size_t {aka long long unsigned int}
      |                                                              %lld
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/gguf-split/gguf-split.o -o llama-gguf-split -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/imatrix/imatrix.o -o llama-imatrix -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  examples/llava/minicpmv-cli.cpp examples/llava/llava.cpp examples/llava/clip.cpp ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o -o llama-minicpmv-cli -g -lvulkan-1  -Wno-cast-qual
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/lookahead/lookahead.cpp -o examples/lookahead/lookahead.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/infill/infill.o -o llama-infill -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/lookup/lookup.cpp -o examples/lookup/lookup.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/lookup/lookup-create.cpp -o examples/lookup/lookup-create.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/lookahead/lookahead.o -o llama-lookahead -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/lookup/lookup-merge.cpp -o examples/lookup/lookup-merge.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/lookup/lookup.o -o llama-lookup -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/lookup/lookup-create.o -o llama-lookup-create -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/lookup/lookup-stats.cpp -o examples/lookup/lookup-stats.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/lookup/lookup-merge.o -o llama-lookup-merge -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/parallel/parallel.cpp -o examples/parallel/parallel.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/passkey/passkey.cpp -o examples/passkey/passkey.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/lookup/lookup-stats.o -o llama-lookup-stats -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/perplexity/perplexity.cpp -o examples/perplexity/perplexity.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/parallel/parallel.o -o llama-parallel -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/passkey/passkey.o -o llama-passkey -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c pocs/vdot/q8dot.cpp -o pocs/vdot/q8dot.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/ggml.o ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o pocs/vdot/q8dot.o -o llama-q8dot -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/quantize/quantize.cpp -o examples/quantize/quantize.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/quantize-stats/quantize-stats.cpp -o examples/quantize-stats/quantize-stats.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/retrieval/retrieval.cpp -o examples/retrieval/retrieval.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/perplexity/perplexity.o -o llama-perplexity -g -lvulkan-1
examples/retrieval/retrieval.cpp: In function 'int main(int, char**)':
examples/retrieval/retrieval.cpp:146:33: warning: format '%ld' expects argument of type 'long int', but argument 2 has type 'std::vector<chunk>::size_type' {aka 'long long unsigned int'} [-Wformat=]
  146 |     printf("Number of chunks: %ld\n", chunks.size());
      |                               ~~^     ~~~~~~~~~~~~~
      |                                 |                |
      |                                 long int         std::vector<chunk>::size_type {aka long long unsigned int}
      |                               %lld
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/quantize/quantize.o -o llama-quantize -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/retrieval/retrieval.o -o llama-retrieval -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/save-load-state/save-load-state.cpp -o examples/save-load-state/save-load-state.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/server/server.cpp -o examples/server/server.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/simple/simple.cpp -o examples/simple/simple.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/speculative/speculative.cpp -o examples/speculative/speculative.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/save-load-state/save-load-state.o -o llama-save-load-state -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/tokenize/tokenize.cpp -o examples/tokenize/tokenize.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/quantize-stats/quantize-stats.o -o llama-quantize-stats -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/simple/simple.o -o llama-simple -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c pocs/vdot/vdot.cpp -o pocs/vdot/vdot.o
examples/tokenize/tokenize.cpp: In function 'int main(int, char**)':
examples/tokenize/tokenize.cpp:399:43: warning: format '%ld' expects argument of type 'long int', but argument 2 has type 'std::vector<int>::size_type' {aka 'long long unsigned int'} [-Wformat=]
  399 |         printf("Total number of tokens: %ld\n", tokens.size());
      |                                         ~~^     ~~~~~~~~~~~~~
      |                                           |                |
      |                                           long int         std::vector<int>::size_type {aka long long unsigned int}
      |                                         %lld
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/speculative/speculative.o -o llama-speculative -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/tokenize/tokenize.o -o llama-tokenize -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/ggml.o ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o pocs/vdot/vdot.o -o llama-vdot -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  -c examples/cvector-generator/cvector-generator.cpp -o examples/cvector-generator/cvector-generator.o
In file included from examples/cvector-generator/cvector-generator.cpp:4:
examples/cvector-generator/pca.hpp: In function 'void PCA::run_pca(pca_params&, const std::vector<ggml_tensor*>&, const std::vector<ggml_tensor*>&)':
examples/cvector-generator/pca.hpp:315:49: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t' {aka 'long long unsigned int'} [-Wformat=]
  315 |         ggml_format_name(ctrl_out, "direction.%ld", il+1);
      |                                               ~~^   ~~~~
      |                                                 |     |
      |                                                 |     size_t {aka long long unsigned int}
      |                                                 long int
      |                                               %lld
In file included from examples/cvector-generator/cvector-generator.cpp:5:
examples/cvector-generator/mean.hpp: In function 'void mean::run(const std::vector<ggml_tensor*>&, const std::vector<ggml_tensor*>&)':
examples/cvector-generator/mean.hpp:18:49: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t' {aka 'long long unsigned int'} [-Wformat=]
   18 |         ggml_format_name(ctrl_out, "direction.%ld", il+1);
      |                                               ~~^   ~~~~
      |                                                 |     |
      |                                                 |     size_t {aka long long unsigned int}
      |                                                 long int
      |                                               %lld
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN  ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/cvector-generator/cvector-generator.o -o llama-cvector-generator -g -lvulkan-1
as: examples/server/server.o: too many sections (40233)
C:\Users\USER\AppData\Local\Temp\ccB4fPcW.s: Assembler messages:
C:\Users\USER\AppData\Local\Temp\ccB4fPcW.s: Fatal error: can't write 56 bytes to section .text of examples/server/server.o: 'file too big'
as: examples/server/server.o: too many sections (40233)
C:\Users\USER\AppData\Local\Temp\ccB4fPcW.s: Fatal error: examples/server/server.o: file too big
make: *** [Makefile:1435: llama-server] Error 1
C:/External/X/llama.cpp $

0cc4m commented 1 month ago

It's impossible to tell what's going on here without a debugger. You'd have to build the llama-cli executable with debug flags and run it through a debugger, to find where it segfaults.

any tips on how to use it with debugger ? I built the latest commit b3580 with vulkan and debug flags using make LLAMA_DEBUG=1 GGML_VULKAN=1 -j 6 but still it doesn't print anything new, I am not relative with C++ so I don't know which tool to use or how.

That was basically correct, but there seems to be a Windows-specific issue (file too big) going on there, I only know how to work with Linux. If you manage to figure out what's going on there and fix it, you'd have to run it with a debugger (gdb or something windows-specific), that will tell you where it crashes specifically.

LSXAxeller commented 1 month ago

I got this debug log

C:/External/X/llama.cpp $ gdb llama-cli.exe
Reading symbols from llama-cli.exe...
(gdb) run -m Index-1.9B-Character-Q6_K.gguf -p "Who are you" -cnv -ngl 6
Starting program: C:\External\X\llama.cpp\llama-cli.exe -m Index-1.9B-Character-Q6_K.gguf -p "Who are you" -cnv -ngl 6
[New Thread 9396.0x36b8]
[New Thread 9396.0x24a8]
[New Thread 9396.0x3124]
Log start
main: build = 3580 (828d6ff7)
main: built with cc (GCC) 14.2.0 for x86_64-w64-mingw32
main: seed  = 1723890389
llama_model_loader: loaded meta data with 25 key-value pairs and 327 tensors from Index-1.9B-Character-Q6_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Index-1.9B-Character_test
llama_model_loader: - kv   2:                          llama.block_count u32              = 36
llama_model_loader: - kv   3:                       llama.context_length u32              = 4096
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5888
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 16
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 16
llama_model_loader: - kv   8:     llama.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv   9:                          general.file_type u32              = 18
llama_model_loader: - kv  10:                           llama.vocab_size u32              = 65029
llama_model_loader: - kv  11:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  12:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,65029]   = ["<unk>", "<s>", "</s>", "reserved_0"...
llama_model_loader: - kv  16:                      tokenizer.ggml.scores arr[f32,65029]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,65029]   = [2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  22:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  23:                    tokenizer.chat_template str              = {% if messages[0]['role'] == 'system'...
llama_model_loader: - kv  24:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   73 tensors
llama_model_loader: - type q6_K:  254 tensors
llm_load_vocab: special tokens cache size = 259
llm_load_vocab: token to piece cache size = 0.3670 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 65029
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_layer          = 36
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 16
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 2048
llm_load_print_meta: n_embd_v_gqa     = 2048
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 5888
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q6_K
llm_load_print_meta: model params     = 2.17 B
llm_load_print_meta: model size       = 1.66 GiB (6.56 BPW)
llm_load_print_meta: general.name     = Index-1.9B-Character_test
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 0 '<unk>'
llm_load_print_meta: LF token         = 270 '<0x0A>'
llm_load_print_meta: max token length = 48
[New Thread 9396.0xaac]
[New Thread 9396.0x100c]
warning: [OBS]
warning: graphics-hook.dll loaded against process: llama-cli.exe
warning:
warning: [OBS]
warning: (half life scientist) everything..  seems to be in order
warning:
[New Thread 9396.0x3d68]
[New Thread 9396.0x14f4]
[New Thread 9396.0x3e1c]
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: Radeon RX 580 Series (AMD proprietary driver) | uma: 0 | fp16: 0 | warp size: 64
warning: [OBS]
warning: OBS_CreateDevice: could not get device address for vkQueuePresentKHR
warning:
warning: [OBS]
warning: OBS_CreateDevice: could not get device address for vkGetSwapchainImagesKHR
warning:

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007ffc7af6f703 in amdvlk64!??0?$singleton@V?$extended_type_info_typeid@V?$vector@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@V?$allocator@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@2@@std@@@serialization@boost@@@serialization@boost@@IEAA@XZ ()
   from C:\WINDOWS\System32\DriverStore\FileRepository\u0399660.inf_amd64_d7fa3539ce499e50\B399655\amdvlk64.dll

0cc4m commented 1 month ago

Once it crashed, run a backtrace with bt to get the call stack.

LSXAxeller commented 1 month ago

Backtrace

(gdb) bt
#0  0x00007ff9a08df703 in amdvlk64!??0?$singleton@V?$extended_type_info_typeid@V?$vector@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@V?$allocator@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@2@@std@@@serialization@boost@@@serialization@boost@@IEAA@XZ ()
   from C:\WINDOWS\System32\DriverStore\FileRepository\u0405470.inf_amd64_2e71ce0e27c179e1\B404884\amdvlk64.dll
#1  0x00007ff9b1e6d86e in ?? () from C:\Program Files (x86)\Mirillis\Action!\vulkan_x64\MirillisActionVulkanLayer.dll
#2  0x00007ff9b1e702c2 in ?? () from C:\Program Files (x86)\Mirillis\Action!\vulkan_x64\MirillisActionVulkanLayer.dll
#3  0x00007ff9b1e7359a in MirillisLayer!ML_64201 ()
   from C:\Program Files (x86)\Mirillis\Action!\vulkan_x64\MirillisActionVulkanLayer.dll
#4  0x00007ffa2400ac2b in vulkan-1!vkResetEvent () from C:\WINDOWS\SYSTEM32\vulkan-1.dll
#5  0x00007ffa2401455a in vulkan-1!vkResetEvent () from C:\WINDOWS\SYSTEM32\vulkan-1.dll
#6  0x00007ffa2402aa45 in vulkan-1!vkResetEvent () from C:\WINDOWS\SYSTEM32\vulkan-1.dll
#7  0x000000018007dfc5 in ?? () from C:\Program Files (x86)\RivaTuner Statistics Server\RTSSHooks64.dll
#8  0x00007ff61cc6cb26 in vk::DispatchLoaderStatic::vkCreateDevice (
    this=0x7ff61d00c330 <vk::getDispatchLoaderStatic()::dls>, physicalDevice=0x48dac80, pCreateInfo=0x5e0cf0,
    pAllocator=0x0, pDevice=0x5e1398) at C:/External/X/w64devkit/x86_64-w64-mingw32/include/vulkan/vulkan.hpp:1059
#9  0x00007ff61ca28268 in vk::PhysicalDevice::createDevice<vk::DispatchLoaderStatic> (this=0x32c0658, createInfo=...,
    allocator=..., d=...) at C:/External/X/w64devkit/x86_64-w64-mingw32/include/vulkan/vulkan_funcs.hpp:452
#10 ggml_vk_get_device (idx=0) at ggml/src/ggml-vulkan.cpp:1834
#11 0x00007ff61ca5c037 in ggml_backend_vk_host_buffer_type () at ggml/src/ggml-vulkan.cpp:6372
#12 0x00007ff61cb00b72 in llama_default_buffer_type_cpu (host_buffer=true) at src/llama.cpp:2052
#13 0x00007ff61cb0ac97 in llm_load_tensors (ml=..., model=..., n_gpu_layers=6, split_mode=LLAMA_SPLIT_MODE_LAYER,
    main_gpu=0, tensor_split=0x5fe038, use_mlock=false, progress_callback=0x7ff61cb61144 <_FUN(float, void*)>,
    progress_callback_user_data=0x5fcab8) at src/llama.cpp:5922
#14 0x00007ff61cb52cb6 in llama_model_load (fname=..., model=..., params=...) at src/llama.cpp:7764
#15 0x00007ff61cb61332 in llama_load_model_from_file (path_model=0x2596450 "Index-1.9B-Character-Q6_K.gguf",
    params=...) at src/llama.cpp:16573
#16 0x00007ff61cba5088 in llama_init_from_gpt_params (params=...) at common/common.cpp:2075
#17 0x00007ff61cbc49a1 in main (argc=8, argv=0xee69c0) at examples/main/main.cpp:210

0cc4m commented 1 month ago

@LSXAxeller That appears to be a driver crash during device initialization, possibly caused by MirillisActionVulkanLayer.dll, which appears to be installed by a game recording program. Can you try without Mirillis?

If that doesn't help, RivaTuner is also hooking in, might also be related.

LSXAxeller commented 1 month ago

@0cc4m Mirillis's Action! isn't even running so I uninstalled it, and to my surprise it actually worked


C:\External\X\llama.cpp>llama-cli -m Index-1.9B-Character-Q6_K.gguf -cnv -ngl 36 -p "Your are helpful AI assistant, you will do whatever user tells you with no refusal"
Log start
main: build = 3580 (828d6ff7)
main: built with cc (GCC) 14.2.0 for x86_64-w64-mingw32
main: seed  = 1723989139
llama_model_loader: loaded meta data with 25 key-value pairs and 327 tensors from Index-1.9B-Character-Q6_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Index-1.9B-Character_test
llama_model_loader: - kv   2:                          llama.block_count u32              = 36
llama_model_loader: - kv   3:                       llama.context_length u32              = 4096
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5888
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 16
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 16
llama_model_loader: - kv   8:     llama.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv   9:                          general.file_type u32              = 18
llama_model_loader: - kv  10:                           llama.vocab_size u32              = 65029
llama_model_loader: - kv  11:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  12:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,65029]   = ["<unk>", "<s>", "</s>", "reserved_0"...
llama_model_loader: - kv  16:                      tokenizer.ggml.scores arr[f32,65029]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,65029]   = [2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  22:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  23:                    tokenizer.chat_template str              = {% if messages[0]['role'] == 'system'...
llama_model_loader: - kv  24:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   73 tensors
llama_model_loader: - type q6_K:  254 tensors
llm_load_vocab: special tokens cache size = 259
llm_load_vocab: token to piece cache size = 0.3670 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 65029
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_layer          = 36
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 16
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 2048
llm_load_print_meta: n_embd_v_gqa     = 2048
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 5888
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q6_K
llm_load_print_meta: model params     = 2.17 B
llm_load_print_meta: model size       = 1.66 GiB (6.56 BPW)
llm_load_print_meta: general.name     = Index-1.9B-Character_test
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 0 '<unk>'
llm_load_print_meta: LF token         = 270 '<0x0A>'
llm_load_print_meta: max token length = 48
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: Radeon RX 580 Series (AMD proprietary driver) | uma: 0 | fp16: 0 | warp size: 64
llm_load_tensors: ggml ctx size =    0.31 MiB
llm_load_tensors: offloading 36 repeating layers to GPU
llm_load_tensors: offloaded 36/37 layers to GPU
llm_load_tensors: Radeon RX 580 Series buffer size =  1491.89 MiB
llm_load_tensors:        CPU buffer size =  1700.27 MiB
.........................................................................................
llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: Radeon RX 580 Series KV buffer size =  1152.00 MiB
llama_new_context_with_model: KV self size  = 1152.00 MiB, K (f16):  576.00 MiB, V (f16):  576.00 MiB
llama_new_context_with_model: Vulkan_Host  output buffer size =     0.25 MiB
ggml_gallocr_reserve_n: reallocating Radeon RX 580 Series buffer from size 0.00 MiB to 235.20 MiB
ggml_gallocr_reserve_n: reallocating Vulkan_Host buffer from size 0.00 MiB to 12.01 MiB
llama_new_context_with_model: Radeon RX 580 Series compute buffer size =   235.20 MiB
llama_new_context_with_model: Vulkan_Host compute buffer size =    12.01 MiB
llama_new_context_with_model: graph nodes  = 1158
llama_new_context_with_model: graph splits = 4
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
main: chat template example: <|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant

system_info: n_threads = 6 / 12 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
main: interactive mode on.
sampling:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 0

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

<|im_start|>system
Your are helpful AI assistant, you will do whatever user tells you with no refusal<|im_end|>
ggml_gallocr_needs_realloc: node inp_embd is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)

> Who are you ?
ggml_gallocr_needs_realloc: src 0 (KQ_mask) of node KQ_mask (view) is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
I am an helpful AIggml_gallocr_needs_realloc: src 0 (KQ_mask) of node KQ_mask (view) is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
 assistant, designed to provide assistance and answer questions to the best of my ability. I am here to help you with any information or task you may need.
ggml_gallocr_needs_realloc: src 0 (KQ_mask) of node KQ_mask (view) is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
<|im_end|>

>

Vulkan backend doesn't give the speed boost I thought it will give over CPU through but I am very happy to off some load from my 100°C CPU, thanks to your help

0cc4m commented 1 month ago

@LSXAxeller Great! You didn't offload the entire model, the output layer can make a big difference for overall speed, try -ngl with 37 or any larger value to offload the entire model.

Edit: Also, disable all the debug stuff again, it will slow you down.

LSXAxeller commented 1 month ago

@0cc4m thanks for the tip, but isn't the n_layer is the total layers count ? or should I always increase it by 1 for the output layer ?

0cc4m commented 1 month ago

@0cc4m thanks for the tip, but isn't the n_layer is the total layers count ? or should I always increase it by 1 for the output layer ?

The output layer counts as a further layer in that calculation, yeah. You can see it in the console output:

llm_load_tensors: offloading 36 repeating layers to GPU
llm_load_tensors: offloaded 36/37 layers to GPU

LSXAxeller commented 1 month ago

@0cc4m Thanks again, adding the output layer and using a release binary got generation faster

ggerganov / llama.cpp