Closed LSXAxeller closed 1 month ago
It's impossible to tell what's going on here without a debugger. You'd have to build the llama-cli executable with debug flags and run it through a debugger, to find where it segfaults.
It's impossible to tell what's going on here without a debugger. You'd have to build the llama-cli executable with debug flags and run it through a debugger, to find where it segfaults.
any tips on how to use it with debugger ? I built the latest commit b3580
with vulkan and debug flags using make LLAMA_DEBUG=1 GGML_VULKAN=1 -j 6
but still it doesn't print anything new, I am not relative with C++ so I don't know which tool to use or how.
Microsoft Windows [Version 10.0.22631.3958]
(c) Microsoft Corporation. All rights reserved.
C:\External\X\w64devkit>w64devkit.exe
~ $ SDK_VERSION=1.3.290.0
~ $ cp "C:/Program Files/VulkanSDK/$SDK_VERSION/Bin/glslc.exe" $W64DEVKIT_HOME/bin/
~ $ cp "C:/Program Files/VulkanSDK/$SDK_VERSION/Lib/vulkan-1.lib" $W64DEVKIT_HOME/x86_64-w64-mingw32/lib/
~ $ cp -r "C:/Program Files/VulkanSDK/$SDK_VERSION/Include/*" $W64DEVKIT_HOME/x86_64-w64-mingw32/include/
cp: can't stat 'C:/Program Files/VulkanSDK/1.3.290.0/Include/*': No such file or directory
~ $ cp -r "C:/Program Files/VulkanSDK/$SDK_VERSION/Include/"* $W64DEVKIT_HOME/x86_64-w64-mingw32/include/
~ $ cat > $W64DEVKIT_HOME/x86_64-w64-mingw32/lib/pkgconfig/vulkan.pc <<EOF
> Name: Vulkan-Loader
> Description: Vulkan Loader
> Version: $SDK_VERSION
> Libs: -lvulkan-1
> EOF
~ $ cd "C:\External\X\llama.cpp"
C:/External/X/llama.cpp $ make LLAMA_DEBUG=1 GGML_VULKAN=1 -j 6
I ccache not found. Consider installing it for faster compilation.
I llama.cpp build info:
I UNAME_S: Windows_NT
I UNAME_P: unknown
I UNAME_M: x86_64
I CFLAGS: -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -std=c11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion
I CXXFLAGS: -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN
I NVCCFLAGS: -std=c++11 -O0 -g
I LDFLAGS: -g -lvulkan-1
I CC: cc (GCC) 14.2.0
I CXX: c++ (GCC) 14.2.0
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c ggml/src/llamafile/sgemm.cpp -o ggml/src/llamafile/sgemm.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -o vulkan-shaders-gen -g -lvulkan-1 ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp
cc -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -std=c11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion -c ggml/src/ggml.c -o ggml/src/ggml.o
cc -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -std=c11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion -c ggml/src/ggml-alloc.c -o ggml/src/ggml-alloc.o
cc -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -std=c11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion -c ggml/src/ggml-backend.c -o ggml/src/ggml-backend.o
cc -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -std=c11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion -c ggml/src/ggml-quants.c -o ggml/src/ggml-quants.o
ggml/src/ggml.c:90:8: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
90 | static atomic_bool atomic_flag_test_and_set(atomic_flag * ptr) {
| ^~~~~~~~~~~
cc -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -std=c11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion -c ggml/src/ggml-aarch64.c -o ggml/src/ggml-aarch64.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c src/llama.cpp -o src/llama.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c src/llama-vocab.cpp -o src/llama-vocab.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c src/llama-grammar.cpp -o src/llama-grammar.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c src/llama-sampling.cpp -o src/llama-sampling.o
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:60:6: warning: no previous declaration for 'void execute_command(const std::string&, std::string&, std::string&)' [-Wmissing-declarations]
60 | void execute_command(const std::string& command, std::string& stdout_str, std::string& stderr_str) {
| ^~~~~~~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp: In function 'void execute_command(const std::string&, std::string&, std::string&)':
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::lpReserve ' [-Wmissing-field-initializers]
77 | STARTUPINFOA si = { sizeof(STARTUPINFOA) };
| ^
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::lpDesktop' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::lpTitle' -Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::dwX' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::dwY' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::dwXSize' -Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::dwYSize' -Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::dwXCountChars' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::dwYCountChars' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::dwFillAttribute' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::dwFlags' -Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::wShowWindow' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::cbReserved2' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::lpReserved2' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::hStdInput' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::hStdOutpu ' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:77:46: warning: missing initializer for member '_STARTUPINFOA::hStdError' [-Wmissing-field-initializers]
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp: At global scope:
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:152:6: warning: no previous declaration for 'bool directory_exists(const std::string&)' [-Wmissing-declarations]
152 | bool directory_exists(const std::string& path) {
| ^~~~~~~~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:160:6: warning: no previous declaration for 'bool create_directory(const std::string&)' [-Wmissing-declarations]
160 | bool create_directory(const std::string& path) {
| ^~~~~~~~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:168:13: warning: no previous declaration for 'std::string to_uppercase(const std::string&)' [-Wmissing-declarations]
168 | std::string to_uppercase(const std::string& input) {
| ^~~~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:176:6: warning: no previous declaration for 'bool string_ends_with(const std::string&, const std::string&)' [-Wmissing-declarations]
176 | bool string_ends_with(const std::string& str, const std::string& suffix) {
| ^~~~~~~~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:185:13: warning: no previous declaration for 'std::string join_paths(const std::string&, const std::string&)' [-Wmissing-declarations]
185 | std::string join_paths(const std::string& path1, const std::string& path2) {
| ^~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:189:13: warning: no previous declaration for 'std::string basename(const std::string&)' [-Wmissing-declarations]
189 | std::string basename(const std::string &path) {
| ^~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:193:6: warning: no previous declaration for 'void string_to_spv(const std::string&, const std::string&, const std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >&, bool)' [-Wmissing-declarations]
193 | void string_to_spv(const std::string& _name, const std::string& in_fname, const std::map<std::string, std::string>& defines, bool fp16 = true) {
| ^~~~~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:233:36: warning: no previous declaration for 'std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> > merge_maps(const std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >&, const std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >&)' [-Wmissing-declarations]
233 | std::map<std::string, std::string> merge_maps(const std::map<std::string, std::string>& a, const std::map<std::string, std::string>& b) {
| ^~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:239:6: warning: no previous declaration for 'void matmul_shaders(std::vector<std::future<void> >&, bool, bool)' [-Wmissing-declarations]
239 | void matmul_shaders(std::vector<std::future<void>>& tasks, bool fp16, bool matmul_id) {
| ^~~~~~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:286:6: warning: no previous declaration for 'void process_shaders(std::vector<std::future<void> >&)' [-Wmissing-declarations]
286 | void process_shaders(std::vector<std::future<void>>& tasks) {
| ^~~~~~~~~~~~~~~
ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp:477:6: warning: no previous declaration for 'void write_output_files()' -Wmissing-declarations]
477 | void write_output_files() {
| ^~~~~~~~~~~~~~~~~~
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c src/unicode.cpp -o src/unicode.o
src/llama.cpp: In member function 'std::string llama_file::GetErrorMessageWin32(DWORD) const':
src/llama.cpp:1480:46: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'DWORD' {aka 'long unsigned int'} [-Wformat=]
1480 | ret = format("Win32 error code: %s", error_code);
| ~^ ~~~~~~~~~~
| | |
| | DWORD {aka long unsigned int}
| char*
| %ld
src/llama.cpp: In constructor 'llama_mmap::llama_mmap(llama_file*, size_t, bool)':
src/llama.cpp:1818:38: warning: cast between incompatible function types from 'FARPROC' {aka 'long long int (*)()'} to BOOL (*)(HANDLE, ULONG_PTR, PWIN32_MEMORY_RANGE_ENTRY, ULONG)' {aka 'int (*)(void*, long long unsigned int, _WIN32_MEMORY_RANGE_ENTRY*, long unsigned int)'} [-Wcast-function-type]
1818 | pPrefetchVirtualMemory = reinterpret_cast<decltype(pPrefetchVirtualMemory)> (GetProcAddress(hKernel32, "PrefetchVirtualMemory"));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c src/unicode-data.cpp -o src/unicode-data.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c common/common.cpp -o common/common.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c common/console.cpp -o common/console.o
In file included from src/llama.cpp:1:
src/llama.cpp: In function 'void llama_lora_adapter_init_internal(llama_model*, const char*, llama_lora_adapter&)':
src/llama.cpp:16360:20: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'std::unordered_map<std::__cxx11::basic_string<char>, llama_lora_weight>::size_type' {aka 'long long unsigned int'} [-Wformat=]
16360 | LLAMA_LOG_INFO("%s: loaded %ld tensors from lora file\n", __func__, adapter.ab_map.size()*2);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
| |
| std::unordered_map<std::__cxx11::basic_string<char>, llama_lora_weight>::size_type {aka long long unsigned int}
src/llama-impl.h:24:71: note: in definition of macro 'LLAMA_LOG_INFO'
24 | #define LLAMA_LOG_INFO(...) llama_log_internal(GGML_LOG_LEVEL_INFO , __VA_ARGS__)
| ^~~~~~~~~~~
src/llama.cpp:16360:34: note: format string is defined here
16360 | LLAMA_LOG_INFO("%s: loaded %ld tensors from lora file\n", __func__, adapter.ab_map.size()*2);
| ~~^
| |
| long int
| %lld
src/llama.cpp: In function 'float* llama_get_logits_ith(llama_context*, int32_t)':
src/llama.cpp:18575:65: warning: format '%lu' expects argument of type 'long unsigned int', but argument 2 has type 'std::vector<int>::size_type' {aka 'long long unsigned int'} [-Wformat=]
18575 | throw std::runtime_error(format("out of range [0, %lu)", ctx->output_ids.size()));
| ~~^ ~~~~~~~~~~~~~~~~~~~~~~
| | |
| long unsigned int std::vector<int>::size_type {aka long long unsigned int}
| %llu
src/llama.cpp: In function 'float* llama_get_embeddings_ith(llama_context*, int32_t)':
src/llama.cpp:18620:65: warning: format '%lu' expects argument of type 'long unsigned int', but argument 2 has type 'std::vector<int>::size_type' {aka 'long long unsigned int'} [-Wformat=]
18620 | throw std::runtime_error(format("out of range [0, %lu)", ctx->output_ids.size()));
| ~~^ ~~~~~~~~~~~~~~~~~~~~~~
| | |
| long unsigned int std::vector<int>::size_type {aka long long unsigned int}
| %llu
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c common/ngram-cache.cpp -o common/ngram-cache.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c common/sampling.cpp -o common/sampling.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c common/train.cpp -o common/train.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c common/grammar-parser.cpp -o common/grammar-parser.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c common/json-schema-to-grammar.cpp -o common/json-schema-to-grammar.o
cc -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -std=c11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion -Iexamples/gguf-hash/deps -c examples/gguf-hash/deps/sha1/sha1.c -o examples/gguf-hash/deps/sha1/sha1.o
cc -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -std=c11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion -Iexamples/gguf-hash/deps -c examples/gguf-hash/deps/xxhash/xxhash.c -o examples/gguf-hash/deps/xxhash/xxhash.o
cc -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -std=c11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion -Iexamples/gguf-hash/deps -c examples/gguf-hash/deps/sha256/sha256.c -o examples/gguf-hash/deps/sha256/sha256.o
cc -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -std=c11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -fopenmp -Wdouble-promotion -c tests/test-c.c -o tests/test-c.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/deprecation-warning/deprecation-warning.cpp -o examples/deprecation-warning/deprecation-warning.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c common/build-info.cpp -o common/build-info.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN examples/deprecation-warning/deprecation-warning.o -o main -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN examples/deprecation-warning/deprecation-warning.o -o server -g -lvulkan-1
NOTICE: The 'main' binary is deprecated. Please use 'llama-cli' instead.
NOTICE: The 'server' binary is deprecated. Please use 'llama-server' instead.
C:/External/X/llama.cpp/vulkan-shaders-gen \
--glslc glslc \
--input-dir ggml/src/vulkan-shaders \
--target-hpp ggml/src/ggml-vulkan-shaders.hpp \
--target-cpp ggml/src/ggml-vulkan-shaders.cpp
ggml_vulkan: Generating and compiling shaders to SPIR-V
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c ggml/src/ggml-vulkan.cpp -o ggml/src/ggml-vulkan.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c -o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml-vulkan-shaders.cpp
ggml/src/ggml-vulkan.cpp: In function 'int ggml_backend_vk_reg_devices()':
ggml/src/ggml-vulkan.cpp:6789:43: warning: format '%ld' expects argument of type 'long int', but argument 5 has type 'size_t' {aka 'long long unsigned int'} [-Wformat=]
6789 | snprintf(name, sizeof(name), "%s%ld", GGML_VK_NAME, i);
| ~~^ ~
| | |
| long int size_t {aka long long unsigned int}
| %lld
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -static -fPIC -c examples/llava/llava.cpp -o libllava.a -Wno-cast-qual
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/baby-llama/baby-llama.cpp -o examples/baby-llama/baby-llama.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/batched/batched.cpp -o examples/batched/batched.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/batched-bench/batched-bench.cpp -o examples/batched-bench/batched-bench.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/llama-bench/llama-bench.cpp -o examples/llama-bench/llama-bench.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/benchmark/benchmark-matmult.cpp -o examples/benchmark/benchmark-matmult.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/baby-llama/baby-llama.o -o llama-baby-llama -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o common/build-info.o examples/benchmark/benchmark-matmult.o -o llama-benchmark-matmult -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/main/main.cpp -o examples/main/main.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/batched-bench/batched-bench.o -o llama-batched-bench -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/batched/batched.o -o llama-batched -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp -o examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.o
examples/llama-bench/llama-bench.cpp: In constructor 'test::test(const cmd_params_instance&, const llama_model*, const llama_context*)':
examples/llama-bench/llama-bench.cpp:813:43: warning: unknown conversion type character 'F' in format [-Wformat=]
813 | std::strftime(buf, sizeof(buf), "%FT%TZ", gmtime(&t));
| ^
examples/llama-bench/llama-bench.cpp:813:46: warning: unknown conversion type character 'T' in format [-Wformat=]
813 | std::strftime(buf, sizeof(buf), "%FT%TZ", gmtime(&t));
| ^
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/embedding/embedding.cpp -o examples/embedding/embedding.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/eval-callback/eval-callback.cpp -o examples/eval-callback/eval-callback.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/export-lora/export-lora.cpp -o examples/export-lora/export-lora.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.o -o llama-convert-llama2c-to-ggml -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/embedding/embedding.o -o llama-embedding -g -lvulkan-1
examples/export-lora/export-lora.cpp: In member function 'void lora_merge_ctx::run_merge()':
examples/export-lora/export-lora.cpp:267:31: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t' {aka 'long long unsigned int'} [-Wformat=]
267 | printf("%s : merged %ld tensors with lora adapters\n", __func__, n_merged);
| ~~^ ~~~~~~~~
| | |
| long int size_t {aka long long unsigned int}
| %lld
examples/export-lora/export-lora.cpp:268:30: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'std::vector<tensor_transformation>::size_type' {aka 'long long unsigned int'} [-Wformat=]
268 | printf("%s : wrote %ld tensors to output file\n", __func__, trans.size());
| ~~^ ~~~~~~~~~~~~
| | |
| long int std::vector<tensor_transformation>::size_type {aka long long unsigned int}
| %lld
examples/export-lora/export-lora.cpp: In member function 'void lora_merge_ctx::merge_tensor(ggml_tensor*, ggml_tensor*)':
examples/export-lora/export-lora.cpp:354:57: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t' {aka 'long long unsigned int'} [-Wformat=]
354 | printf("%s : + merging from adapter[%ld] type=%s\n", __func__, i, ggml_type_name(inp_a[i]->type));
| ~~^ ~
| | |
| long int size_t {aka long long unsigned int}
| %lld
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/main/main.o -o llama-cli -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/eval-callback/eval-callback.o -o llama-eval-callback -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/gbnf-validator/gbnf-validator.cpp -o examples/gbnf-validator/gbnf-validator.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/export-lora/export-lora.o -o llama-export-lora -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/gguf/gguf.cpp -o examples/gguf/gguf.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/gbnf-validator/gbnf-validator.o -o llama-gbnf-validator -g -lvulkan-1
==== Run ./llama-cli -h for help. ====
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -Iexamples/gguf-hash/deps -c examples/gguf-hash/gguf-hash.cpp -o examples/gguf-hash/gguf-hash.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/gguf-split/gguf-split.cpp -o examples/gguf-split/gguf-split.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o examples/gguf/gguf.o -o llama-gguf -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/gritlm/gritlm.cpp -o examples/gritlm/gritlm.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN examples/gguf-hash/deps/sha1/sha1.o examples/gguf-hash/deps/xxhash/xxhash.o examples/gguf-hash/deps/sha256/sha256.o ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/gguf-hash/gguf-hash.o -o llama-gguf-hash -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/imatrix/imatrix.cpp -o examples/imatrix/imatrix.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/infill/infill.cpp -o examples/infill/infill.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/llama-bench/llama-bench.o -o llama-bench -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/gritlm/gritlm.o -o llama-gritlm -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN examples/llava/llava-cli.cpp examples/llava/llava.cpp examples/llava/clip.cpp ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o -o llama-llava-cli -g -lvulkan-1 -Wno-cast-qual
examples/gguf-split/gguf-split.cpp: In member function 'void split_strategy::print_info()':
examples/gguf-split/gguf-split.cpp:278:28: warning: format '%ld' expects argument of type 'long int', but argument 2 has type 'std::vector<gguf_context*>::size_type' {aka 'long long unsigned int'} [-Wformat=]
278 | printf("n_split: %ld\n", ctx_outs.size());
| ~~^ ~~~~~~~~~~~~~~~
| | |
| long int std::vector<gguf_context*>::size_type {aka long long unsigned int
| %lld
examples/gguf-split/gguf-split.cpp:288:64: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'size_t' {aka 'long long unsigned int'} [-Wformat=]
288 | printf("split %05d: n_tensors = %d, total_size = %ldM\n", i_split + 1, gguf_get_n_tensors(ctx_out), total_size);
| ~~^
~~~~~~~~~~
| |
|
| long int
size_t {aka long long unsigned int}
| %lld
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/gguf-split/gguf-split.o -o llama-gguf-split -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/imatrix/imatrix.o -o llama-imatrix -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN examples/llava/minicpmv-cli.cpp examples/llava/llava.cpp examples/llava/clip.cpp ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o -o llama-minicpmv-cli -g -lvulkan-1 -Wno-cast-qual
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/lookahead/lookahead.cpp -o examples/lookahead/lookahead.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/infill/infill.o -o llama-infill -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/lookup/lookup.cpp -o examples/lookup/lookup.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/lookup/lookup-create.cpp -o examples/lookup/lookup-create.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/lookahead/lookahead.o -o llama-lookahead -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/lookup/lookup-merge.cpp -o examples/lookup/lookup-merge.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/lookup/lookup.o -o llama-lookup -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/lookup/lookup-create.o -o llama-lookup-create -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/lookup/lookup-stats.cpp -o examples/lookup/lookup-stats.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/lookup/lookup-merge.o -o llama-lookup-merge -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/parallel/parallel.cpp -o examples/parallel/parallel.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/passkey/passkey.cpp -o examples/passkey/passkey.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/lookup/lookup-stats.o -o llama-lookup-stats -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/perplexity/perplexity.cpp -o examples/perplexity/perplexity.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/parallel/parallel.o -o llama-parallel -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/passkey/passkey.o -o llama-passkey -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c pocs/vdot/q8dot.cpp -o pocs/vdot/q8dot.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/ggml.o ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o pocs/vdot/q8dot.o -o llama-q8dot -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/quantize/quantize.cpp -o examples/quantize/quantize.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/quantize-stats/quantize-stats.cpp -o examples/quantize-stats/quantize-stats.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/retrieval/retrieval.cpp -o examples/retrieval/retrieval.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/perplexity/perplexity.o -o llama-perplexity -g -lvulkan-1
examples/retrieval/retrieval.cpp: In function 'int main(int, char**)':
examples/retrieval/retrieval.cpp:146:33: warning: format '%ld' expects argument of type 'long int', but argument 2 has type 'std::vector<chunk>::size_type' {aka 'long long unsigned int'} [-Wformat=]
146 | printf("Number of chunks: %ld\n", chunks.size());
| ~~^ ~~~~~~~~~~~~~
| | |
| long int std::vector<chunk>::size_type {aka long long unsigned int}
| %lld
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/quantize/quantize.o -o llama-quantize -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/retrieval/retrieval.o -o llama-retrieval -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/save-load-state/save-load-state.cpp -o examples/save-load-state/save-load-state.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/server/server.cpp -o examples/server/server.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/simple/simple.cpp -o examples/simple/simple.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/speculative/speculative.cpp -o examples/speculative/speculative.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/save-load-state/save-load-state.o -o llama-save-load-state -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/tokenize/tokenize.cpp -o examples/tokenize/tokenize.o
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/quantize-stats/quantize-stats.o -o llama-quantize-stats -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/simple/simple.o -o llama-simple -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c pocs/vdot/vdot.cpp -o pocs/vdot/vdot.o
examples/tokenize/tokenize.cpp: In function 'int main(int, char**)':
examples/tokenize/tokenize.cpp:399:43: warning: format '%ld' expects argument of type 'long int', but argument 2 has type 'std::vector<int>::size_type' {aka 'long long unsigned int'} [-Wformat=]
399 | printf("Total number of tokens: %ld\n", tokens.size());
| ~~^ ~~~~~~~~~~~~~
| | |
| long int std::vector<int>::size_type {aka long long unsigned int}
| %lld
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/speculative/speculative.o -o llama-speculative -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/tokenize/tokenize.o -o llama-tokenize -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/ggml.o ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o pocs/vdot/vdot.o -o llama-vdot -g -lvulkan-1
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN -c examples/cvector-generator/cvector-generator.cpp -o examples/cvector-generator/cvector-generator.o
In file included from examples/cvector-generator/cvector-generator.cpp:4:
examples/cvector-generator/pca.hpp: In function 'void PCA::run_pca(pca_params&, const std::vector<ggml_tensor*>&, const std::vector<ggml_tensor*>&)':
examples/cvector-generator/pca.hpp:315:49: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t' {aka 'long long unsigned int'} [-Wformat=]
315 | ggml_format_name(ctrl_out, "direction.%ld", il+1);
| ~~^ ~~~~
| | |
| | size_t {aka long long unsigned int}
| long int
| %lld
In file included from examples/cvector-generator/cvector-generator.cpp:5:
examples/cvector-generator/mean.hpp: In function 'void mean::run(const std::vector<ggml_tensor*>&, const std::vector<ggml_tensor*>&)':
examples/cvector-generator/mean.hpp:18:49: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t' {aka 'long long unsigned int'} [-Wformat=]
18 | ggml_format_name(ctrl_out, "direction.%ld", il+1);
| ~~^ ~~~~
| | |
| | size_t {aka long long unsigned int}
| long int
| %lld
c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_WIN32_WINNT=0x602 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_VULKAN ggml/src/llamafile/sgemm.o ggml/src/ggml-vulkan.o ggml/src/ggml-vulkan-shaders.o ggml/src/ggml.o ggml/src/ggml-alloc.o ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o src/unicode.o src/unicode-data.o common/common.o common/console.o common/ngram-cache.o common/sampling.o common/train.o common/grammar-parser.o common/build-info.o common/json-schema-to-grammar.o examples/cvector-generator/cvector-generator.o -o llama-cvector-generator -g -lvulkan-1
as: examples/server/server.o: too many sections (40233)
C:\Users\USER\AppData\Local\Temp\ccB4fPcW.s: Assembler messages:
C:\Users\USER\AppData\Local\Temp\ccB4fPcW.s: Fatal error: can't write 56 bytes to section .text of examples/server/server.o: 'file too big'
as: examples/server/server.o: too many sections (40233)
C:\Users\USER\AppData\Local\Temp\ccB4fPcW.s: Fatal error: examples/server/server.o: file too big
make: *** [Makefile:1435: llama-server] Error 1
C:/External/X/llama.cpp $
It's impossible to tell what's going on here without a debugger. You'd have to build the llama-cli executable with debug flags and run it through a debugger, to find where it segfaults.
any tips on how to use it with debugger ? I built the latest commit
b3580
with vulkan and debug flags usingmake LLAMA_DEBUG=1 GGML_VULKAN=1 -j 6
but still it doesn't print anything new, I am not relative with C++ so I don't know which tool to use or how.
That was basically correct, but there seems to be a Windows-specific issue (file too big
) going on there, I only know how to work with Linux. If you manage to figure out what's going on there and fix it, you'd have to run it with a debugger (gdb
or something windows-specific), that will tell you where it crashes specifically.
I got this debug log
C:/External/X/llama.cpp $ gdb llama-cli.exe
Reading symbols from llama-cli.exe...
(gdb) run -m Index-1.9B-Character-Q6_K.gguf -p "Who are you" -cnv -ngl 6
Starting program: C:\External\X\llama.cpp\llama-cli.exe -m Index-1.9B-Character-Q6_K.gguf -p "Who are you" -cnv -ngl 6
[New Thread 9396.0x36b8]
[New Thread 9396.0x24a8]
[New Thread 9396.0x3124]
Log start
main: build = 3580 (828d6ff7)
main: built with cc (GCC) 14.2.0 for x86_64-w64-mingw32
main: seed = 1723890389
llama_model_loader: loaded meta data with 25 key-value pairs and 327 tensors from Index-1.9B-Character-Q6_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = Index-1.9B-Character_test
llama_model_loader: - kv 2: llama.block_count u32 = 36
llama_model_loader: - kv 3: llama.context_length u32 = 4096
llama_model_loader: - kv 4: llama.embedding_length u32 = 2048
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5888
llama_model_loader: - kv 6: llama.attention.head_count u32 = 16
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 16
llama_model_loader: - kv 8: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 9: general.file_type u32 = 18
llama_model_loader: - kv 10: llama.vocab_size u32 = 65029
llama_model_loader: - kv 11: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 12: tokenizer.ggml.add_space_prefix bool = false
llama_model_loader: - kv 13: tokenizer.ggml.model str = llama
llama_model_loader: - kv 14: tokenizer.ggml.pre str = default
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,65029] = ["<unk>", "<s>", "</s>", "reserved_0"...
llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,65029] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,65029] = [2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 21: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 22: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 23: tokenizer.chat_template str = {% if messages[0]['role'] == 'system'...
llama_model_loader: - kv 24: general.quantization_version u32 = 2
llama_model_loader: - type f32: 73 tensors
llama_model_loader: - type q6_K: 254 tensors
llm_load_vocab: special tokens cache size = 259
llm_load_vocab: token to piece cache size = 0.3670 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 65029
llm_load_print_meta: n_merges = 0
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_embd = 2048
llm_load_print_meta: n_layer = 36
llm_load_print_meta: n_head = 16
llm_load_print_meta: n_head_kv = 16
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 2048
llm_load_print_meta: n_embd_v_gqa = 2048
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 5888
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 4096
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 8B
llm_load_print_meta: model ftype = Q6_K
llm_load_print_meta: model params = 2.17 B
llm_load_print_meta: model size = 1.66 GiB (6.56 BPW)
llm_load_print_meta: general.name = Index-1.9B-Character_test
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 0 '<unk>'
llm_load_print_meta: LF token = 270 '<0x0A>'
llm_load_print_meta: max token length = 48
[New Thread 9396.0xaac]
[New Thread 9396.0x100c]
warning: [OBS]
warning: graphics-hook.dll loaded against process: llama-cli.exe
warning:
warning: [OBS]
warning: (half life scientist) everything.. seems to be in order
warning:
[New Thread 9396.0x3d68]
[New Thread 9396.0x14f4]
[New Thread 9396.0x3e1c]
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: Radeon RX 580 Series (AMD proprietary driver) | uma: 0 | fp16: 0 | warp size: 64
warning: [OBS]
warning: OBS_CreateDevice: could not get device address for vkQueuePresentKHR
warning:
warning: [OBS]
warning: OBS_CreateDevice: could not get device address for vkGetSwapchainImagesKHR
warning:
Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007ffc7af6f703 in amdvlk64!??0?$singleton@V?$extended_type_info_typeid@V?$vector@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@V?$allocator@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@2@@std@@@serialization@boost@@@serialization@boost@@IEAA@XZ ()
from C:\WINDOWS\System32\DriverStore\FileRepository\u0399660.inf_amd64_d7fa3539ce499e50\B399655\amdvlk64.dll
Once it crashed, run a backtrace with bt
to get the call stack.
Backtrace
(gdb) bt
#0 0x00007ff9a08df703 in amdvlk64!??0?$singleton@V?$extended_type_info_typeid@V?$vector@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@V?$allocator@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@2@@std@@@serialization@boost@@@serialization@boost@@IEAA@XZ ()
from C:\WINDOWS\System32\DriverStore\FileRepository\u0405470.inf_amd64_2e71ce0e27c179e1\B404884\amdvlk64.dll
#1 0x00007ff9b1e6d86e in ?? () from C:\Program Files (x86)\Mirillis\Action!\vulkan_x64\MirillisActionVulkanLayer.dll
#2 0x00007ff9b1e702c2 in ?? () from C:\Program Files (x86)\Mirillis\Action!\vulkan_x64\MirillisActionVulkanLayer.dll
#3 0x00007ff9b1e7359a in MirillisLayer!ML_64201 ()
from C:\Program Files (x86)\Mirillis\Action!\vulkan_x64\MirillisActionVulkanLayer.dll
#4 0x00007ffa2400ac2b in vulkan-1!vkResetEvent () from C:\WINDOWS\SYSTEM32\vulkan-1.dll
#5 0x00007ffa2401455a in vulkan-1!vkResetEvent () from C:\WINDOWS\SYSTEM32\vulkan-1.dll
#6 0x00007ffa2402aa45 in vulkan-1!vkResetEvent () from C:\WINDOWS\SYSTEM32\vulkan-1.dll
#7 0x000000018007dfc5 in ?? () from C:\Program Files (x86)\RivaTuner Statistics Server\RTSSHooks64.dll
#8 0x00007ff61cc6cb26 in vk::DispatchLoaderStatic::vkCreateDevice (
this=0x7ff61d00c330 <vk::getDispatchLoaderStatic()::dls>, physicalDevice=0x48dac80, pCreateInfo=0x5e0cf0,
pAllocator=0x0, pDevice=0x5e1398) at C:/External/X/w64devkit/x86_64-w64-mingw32/include/vulkan/vulkan.hpp:1059
#9 0x00007ff61ca28268 in vk::PhysicalDevice::createDevice<vk::DispatchLoaderStatic> (this=0x32c0658, createInfo=...,
allocator=..., d=...) at C:/External/X/w64devkit/x86_64-w64-mingw32/include/vulkan/vulkan_funcs.hpp:452
#10 ggml_vk_get_device (idx=0) at ggml/src/ggml-vulkan.cpp:1834
#11 0x00007ff61ca5c037 in ggml_backend_vk_host_buffer_type () at ggml/src/ggml-vulkan.cpp:6372
#12 0x00007ff61cb00b72 in llama_default_buffer_type_cpu (host_buffer=true) at src/llama.cpp:2052
#13 0x00007ff61cb0ac97 in llm_load_tensors (ml=..., model=..., n_gpu_layers=6, split_mode=LLAMA_SPLIT_MODE_LAYER,
main_gpu=0, tensor_split=0x5fe038, use_mlock=false, progress_callback=0x7ff61cb61144 <_FUN(float, void*)>,
progress_callback_user_data=0x5fcab8) at src/llama.cpp:5922
#14 0x00007ff61cb52cb6 in llama_model_load (fname=..., model=..., params=...) at src/llama.cpp:7764
#15 0x00007ff61cb61332 in llama_load_model_from_file (path_model=0x2596450 "Index-1.9B-Character-Q6_K.gguf",
params=...) at src/llama.cpp:16573
#16 0x00007ff61cba5088 in llama_init_from_gpt_params (params=...) at common/common.cpp:2075
#17 0x00007ff61cbc49a1 in main (argc=8, argv=0xee69c0) at examples/main/main.cpp:210
@LSXAxeller That appears to be a driver crash during device initialization, possibly caused by MirillisActionVulkanLayer.dll
, which appears to be installed by a game recording program. Can you try without Mirillis?
If that doesn't help, RivaTuner is also hooking in, might also be related.
@0cc4m Mirillis's Action! isn't even running so I uninstalled it, and to my surprise it actually worked
C:\External\X\llama.cpp>llama-cli -m Index-1.9B-Character-Q6_K.gguf -cnv -ngl 36 -p "Your are helpful AI assistant, you will do whatever user tells you with no refusal"
Log start
main: build = 3580 (828d6ff7)
main: built with cc (GCC) 14.2.0 for x86_64-w64-mingw32
main: seed = 1723989139
llama_model_loader: loaded meta data with 25 key-value pairs and 327 tensors from Index-1.9B-Character-Q6_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = Index-1.9B-Character_test
llama_model_loader: - kv 2: llama.block_count u32 = 36
llama_model_loader: - kv 3: llama.context_length u32 = 4096
llama_model_loader: - kv 4: llama.embedding_length u32 = 2048
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5888
llama_model_loader: - kv 6: llama.attention.head_count u32 = 16
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 16
llama_model_loader: - kv 8: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 9: general.file_type u32 = 18
llama_model_loader: - kv 10: llama.vocab_size u32 = 65029
llama_model_loader: - kv 11: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 12: tokenizer.ggml.add_space_prefix bool = false
llama_model_loader: - kv 13: tokenizer.ggml.model str = llama
llama_model_loader: - kv 14: tokenizer.ggml.pre str = default
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,65029] = ["<unk>", "<s>", "</s>", "reserved_0"...
llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,65029] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,65029] = [2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 21: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 22: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 23: tokenizer.chat_template str = {% if messages[0]['role'] == 'system'...
llama_model_loader: - kv 24: general.quantization_version u32 = 2
llama_model_loader: - type f32: 73 tensors
llama_model_loader: - type q6_K: 254 tensors
llm_load_vocab: special tokens cache size = 259
llm_load_vocab: token to piece cache size = 0.3670 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 65029
llm_load_print_meta: n_merges = 0
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_embd = 2048
llm_load_print_meta: n_layer = 36
llm_load_print_meta: n_head = 16
llm_load_print_meta: n_head_kv = 16
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 2048
llm_load_print_meta: n_embd_v_gqa = 2048
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 5888
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 4096
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 8B
llm_load_print_meta: model ftype = Q6_K
llm_load_print_meta: model params = 2.17 B
llm_load_print_meta: model size = 1.66 GiB (6.56 BPW)
llm_load_print_meta: general.name = Index-1.9B-Character_test
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 0 '<unk>'
llm_load_print_meta: LF token = 270 '<0x0A>'
llm_load_print_meta: max token length = 48
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: Radeon RX 580 Series (AMD proprietary driver) | uma: 0 | fp16: 0 | warp size: 64
llm_load_tensors: ggml ctx size = 0.31 MiB
llm_load_tensors: offloading 36 repeating layers to GPU
llm_load_tensors: offloaded 36/37 layers to GPU
llm_load_tensors: Radeon RX 580 Series buffer size = 1491.89 MiB
llm_load_tensors: CPU buffer size = 1700.27 MiB
.........................................................................................
llama_new_context_with_model: n_ctx = 4096
llama_new_context_with_model: n_batch = 2048
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: Radeon RX 580 Series KV buffer size = 1152.00 MiB
llama_new_context_with_model: KV self size = 1152.00 MiB, K (f16): 576.00 MiB, V (f16): 576.00 MiB
llama_new_context_with_model: Vulkan_Host output buffer size = 0.25 MiB
ggml_gallocr_reserve_n: reallocating Radeon RX 580 Series buffer from size 0.00 MiB to 235.20 MiB
ggml_gallocr_reserve_n: reallocating Vulkan_Host buffer from size 0.00 MiB to 12.01 MiB
llama_new_context_with_model: Radeon RX 580 Series compute buffer size = 235.20 MiB
llama_new_context_with_model: Vulkan_Host compute buffer size = 12.01 MiB
llama_new_context_with_model: graph nodes = 1158
llama_new_context_with_model: graph splits = 4
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
main: chat template example: <|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
system_info: n_threads = 6 / 12 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
main: interactive mode on.
sampling:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 0
== Running in interactive mode. ==
- Press Ctrl+C to interject at any time.
- Press Return to return control to the AI.
- To return control without starting a new line, end your input with '/'.
- If you want to submit another line, end your input with '\'.
<|im_start|>system
Your are helpful AI assistant, you will do whatever user tells you with no refusal<|im_end|>
ggml_gallocr_needs_realloc: node inp_embd is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
> Who are you ?
ggml_gallocr_needs_realloc: src 0 (KQ_mask) of node KQ_mask (view) is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
I am an helpful AIggml_gallocr_needs_realloc: src 0 (KQ_mask) of node KQ_mask (view) is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
assistant, designed to provide assistance and answer questions to the best of my ability. I am here to help you with any information or task you may need.
ggml_gallocr_needs_realloc: src 0 (KQ_mask) of node KQ_mask (view) is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
<|im_end|>
>
Vulkan backend doesn't give the speed boost I thought it will give over CPU through but I am very happy to off some load from my 100°C CPU, thanks to your help
@LSXAxeller Great! You didn't offload the entire model, the output layer can make a big difference for overall speed, try -ngl
with 37 or any larger value to offload the entire model.
Edit: Also, disable all the debug stuff again, it will slow you down.
@0cc4m thanks for the tip, but isn't the n_layer is the total layers count ? or should I always increase it by 1 for the output layer ?
@0cc4m thanks for the tip, but isn't the n_layer is the total layers count ? or should I always increase it by 1 for the output layer ?
The output layer counts as a further layer in that calculation, yeah. You can see it in the console output:
llm_load_tensors: offloading 36 repeating layers to GPU
llm_load_tensors: offloaded 36/37 layers to GPU
@0cc4m Thanks again, adding the output layer and using a release binary got generation faster
What happened?
I mainly use LLamaSharp C# bindings, after updating to v0.14.0 and releasing Vulkan backend, I decided to give it a try instead using CPU inference, but on loading model it crash with console output
I decided to give llama.cpp released binaries a try, firstly I tried using release
b3375
which is the base for LLamaSharp then the latest releaseb3504
, tried both versions with both AVX2 and Vulkan binaries, but result was same like LLamaSharp on Vulkan with console outputI tried with different models, 1.9B, 1.1B, 300M, 22M and different --n-gpu-layers like 1, 0, 8, 16, 36 on RX 580 4GB GPU but utilization still 4% like idle and vram is empty
Name and Version
llama-cli --version: 3504 (e09a800f) built with MSVC 19.29.30154.0 for x64 llama-cli --version: 3375 (36864569) built with MSVC 19.29.30154.0 for x64
What operating system are you seeing the problem on?
No response
Relevant log output
No response