LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.86k stars 342 forks source link

[Feature Request? / Bug?] KoboldCpp can't compile on ARM32 due to k_quants #451

Closed Crataco closed 11 months ago

Crataco commented 12 months ago

Hello!

I've been trying to get llama.cpp running on my phone (32-bit ARM, 3GB RAM) via proot-distro Debian on Termux (since I've had my fair share of problems with native Termux).

When the compilation gets to the k_quant files, llama.cpp fails to build (see here). But after running LLAMA_NO_K_QUANTS=1 make, it succeeds and just barely works, at a speed of ~6-7 seconds per token for OpenLLaMA v2 3B q4_0.

Now, KoboldCpp has always been my preferred frontend for old/weak devices since I can run tiny non-Llama models, but it lacks the Makefile variable (if that's the right term?) that skips the k_quant step.

image

I would really appreciate if this was implemented in KoboldCpp for those of us who can't compile k_quant.

Thanks in advance!

LostRuins commented 11 months ago

will be added in next version

Crataco commented 11 months ago

Hi,

I've cloned the development branch for KoboldCpp with:

git clone -b concedo_experimental https://github.com/LostRuins/koboldcpp.git koboldcpp-dev

but I'm still facing the same problem with LLAMA_NO_K_QUANTS=1 make.

Compiling KoboldCpp on my PC, it still places k_quants.o in the project directory after compiling, and make clean reports removing it as one of the many files.

My phone also has the same error as before. Here's the compilation log, if that helps:

Click to expand

``` I llama.cpp build info: I UNAME_S: Linux I UNAME_P: unknown I UNAME_M: armv7l I CFLAGS: -I. -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11 -fPIC -DLOG_DISABLE_LOGS -D_GNU_SOURCE -pthread -s -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations I CXXFLAGS: -I. -I./common -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c++11 -fPIC -DLOG_DISABLE_LOGS -D_GNU_SOURCE -pthread -s -Wno-multichar -Wno-write-strings -pthread I LDFLAGS: I CC: cc (Debian 12.2.0-14) 12.2.0 I CXX: g++ (Debian 12.2.0-14) 12.2.0 cc -I. -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11 -fPIC -DLOG_DISABLE_LOGS -D_GNU_SOURCE -pthread -s -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -c ggml.c -o ggml.o cc -I. -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11 -fPIC -DLOG_DISABLE_LOGS -D_GNU_SOURCE -pthread -s -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -c otherarch/ggml_v2.c -o ggml_v2.o cc -I. -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11 -fPIC -DLOG_DISABLE_LOGS -D_GNU_SOURCE -pthread -s -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -c otherarch/ggml_v1.c -o ggml_v1.o g++ -I. -I./common -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c++11 -fPIC -DLOG_DISABLE_LOGS -D_GNU_SOURCE -pthread -s -Wno-multichar -Wno-write-strings -pthread -c expose.cpp -o expose.o g++ -I. -I./common -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c++11 -fPIC -DLOG_DISABLE_LOGS -D_GNU_SOURCE -pthread -s -Wno-multichar -Wno-write-strings -pthread -c common/common.cpp -o common.o g++ -I. -I./common -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c++11 -fPIC -DLOG_DISABLE_LOGS -D_GNU_SOURCE -pthread -s -Wno-multichar -Wno-write-strings -pthread -c gpttype_adapter.cpp -o gpttype_adapter.o In file included from ./otherarch/llama_v2-util.h:7, from ./otherarch/llama_v2.cpp:8, from gpttype_adapter.cpp:18: ./otherarch/llama-util.h:56:52: warning: ‘format_old’ attribute directive ignored [-Wattributes] 56 | static std::string format_old(const char * fmt, ...) { | ^ ./otherarch/llama_v2-util.h:60:8: warning: attribute ignored in declaration of ‘struct llama_v2_file’ [-Wattributes] 60 | struct llama_v2_file { | ^~~~~~~~~~~~~ ./otherarch/llama_v2-util.h:60:8: note: attribute for ‘struct llama_v2_file’ must follow the ‘struct’ keyword In file included from gpttype_adapter.cpp:29: ./otherarch/rwkv_v3.cpp:230:21: warning: ‘rwkv_type_to_string’ initialized and declared ‘extern’ 230 | extern const char * rwkv_type_to_string[TYPE_COUNT + 1] = {"FP32", "FP16", "Q4_0", "Q4_1", "Q4_1_O", "Q4_2", "Q4_3", "Q5_0", "Q5_1", "Q8_0", "unknown"}; | ^~~~~~~~~~~~~~~~~~~ ./otherarch/rwkv_v3.cpp: In function ‘ggml_tensor* rwkv_exp(ggml_context*, ggml_tensor*)’: ./otherarch/rwkv_v3.cpp:470:30: warning: ‘ggml_tensor* ggml_map_unary_f32(ggml_context*, ggml_tensor*, ggml_unary_op_f32_t)’ is deprecated: use ggml_map_custom1 instead [-Wdeprecated-declarations] 470 | return ggml_map_unary_f32(ctx, x, rwkv_exp_impl); | ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~ In file included from ./llama.h:4, from ./common/common.h:5, from ./otherarch/utils.h:10, from ./otherarch/otherarch.h:14, from gpttype_adapter.cpp:13: ./ggml.h:1548:51: note: declared here 1548 | GGML_DEPRECATED(GGML_API struct ggml_tensor * ggml_map_unary_f32( | ^~~~~~~~~~~~~~~~~~ ./ggml.h:191:41: note: in definition of macro ‘GGML_DEPRECATED’ 191 | # define GGML_DEPRECATED(func, hint) func __attribute__((deprecated(hint))) | ^~~~ ./otherarch/rwkv_v3.cpp: In function ‘ggml_tensor* rwkv_1_minus_x(ggml_context*, ggml_tensor*)’: ./otherarch/rwkv_v3.cpp:474:30: warning: ‘ggml_tensor* ggml_map_unary_f32(ggml_context*, ggml_tensor*, ggml_unary_op_f32_t)’ is deprecated: use ggml_map_custom1 instead [-Wdeprecated-declarations] 474 | return ggml_map_unary_f32(ctx, x, rwkv_1_minus_x_impl); | ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./ggml.h:1548:51: note: declared here 1548 | GGML_DEPRECATED(GGML_API struct ggml_tensor * ggml_map_unary_f32( | ^~~~~~~~~~~~~~~~~~ ./ggml.h:191:41: note: in definition of macro ‘GGML_DEPRECATED’ 191 | # define GGML_DEPRECATED(func, hint) func __attribute__((deprecated(hint))) | ^~~~ ./otherarch/rwkv_v3.cpp: In function ‘ggml_tensor* rwkv_sigmoid(ggml_context*, ggml_tensor*)’: ./otherarch/rwkv_v3.cpp:478:30: warning: ‘ggml_tensor* ggml_map_unary_f32(ggml_context*, ggml_tensor*, ggml_unary_op_f32_t)’ is deprecated: use ggml_map_custom1 instead [-Wdeprecated-declarations] 478 | return ggml_map_unary_f32(ctx, x, rwkv_sigmoid_impl); | ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~ ./ggml.h:1548:51: note: declared here 1548 | GGML_DEPRECATED(GGML_API struct ggml_tensor * ggml_map_unary_f32( | ^~~~~~~~~~~~~~~~~~ ./ggml.h:191:41: note: in definition of macro ‘GGML_DEPRECATED’ 191 | # define GGML_DEPRECATED(func, hint) func __attribute__((deprecated(hint))) | ^~~~ ./otherarch/rwkv_v3.cpp: In function ‘ggml_tensor* rwkv_max(ggml_context*, ggml_tensor*, ggml_tensor*)’: ./otherarch/rwkv_v3.cpp:482:31: warning: ‘ggml_tensor* ggml_map_binary_f32(ggml_context*, ggml_tensor*, ggml_tensor*, ggml_binary_op_f32_t)’ is deprecated: use ggml_map_custom2 instead [-Wdeprecated-declarations] 482 | return ggml_map_binary_f32(ctx, x, y, rwkv_max_impl); | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~ ./ggml.h:1560:51: note: declared here 1560 | GGML_DEPRECATED(GGML_API struct ggml_tensor * ggml_map_binary_f32( | ^~~~~~~~~~~~~~~~~~~ ./ggml.h:191:41: note: in definition of macro ‘GGML_DEPRECATED’ 191 | # define GGML_DEPRECATED(func, hint) func __attribute__((deprecated(hint))) | ^~~~ gpttype_adapter.cpp: In function ‘void sample_temperature(llama_token_data_array*, float)’: gpttype_adapter.cpp:383:33: warning: ‘void llama_sample_temperature(llama_context*, llama_token_data_array*, float)’ is deprecated: use llama_sample_temp instead [-Wdeprecated-declarations] 383 | llama_sample_temperature(nullptr, candidates_p, temp); | ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from gpttype_adapter.cpp:20: llama.cpp:5269:6: note: declared here 5269 | void llama_sample_temperature(struct llama_context * ctx, llama_token_data_array * candidates_p, float temp) { | ^~~~~~~~~~~~~~~~~~~~~~~~ gpttype_adapter.cpp:388:33: warning: ‘void llama_sample_temperature(llama_context*, llama_token_data_array*, float)’ is deprecated: use llama_sample_temp instead [-Wdeprecated-declarations] 388 | llama_sample_temperature(nullptr, candidates_p, temp); | ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:5269:6: note: declared here 5269 | void llama_sample_temperature(struct llama_context * ctx, llama_token_data_array * candidates_p, float temp) { | ^~~~~~~~~~~~~~~~~~~~~~~~ gpttype_adapter.cpp: In function ‘ModelLoadResult gpttype_load_model(load_model_inputs, FileFormat, FileFormatExtraMeta)’: gpttype_adapter.cpp:823:49: warning: ‘int llama_apply_lora_from_file(llama_context*, const char*, float, const char*, int)’ is deprecated: use llama_model_apply_lora_from_file instead [-Wdeprecated-declarations] 823 | int err = llama_apply_lora_from_file(llama_ctx_v4, | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~ 824 | lora_filename.c_str(), | ~~~~~~~~~~~~~~~~~~~~~~ 825 | 1.0f, | ~~~~~ 826 | lora_base_arg, | ~~~~~~~~~~~~~~ 827 | n_threads); | ~~~~~~~~~~ llama.cpp:6942:5: note: declared here 6942 | int llama_apply_lora_from_file(struct llama_context * ctx, const char * path_lora, float scale, const char * path_base_model, int n_threads) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~ gpttype_adapter.cpp:839:29: warning: ‘int llama_eval(llama_context*, llama_token*, int32_t, int)’ is deprecated: use llama_decode() instead [-Wdeprecated-declarations] 839 | auto er = llama_eval(llama_ctx_v4, tmp.data(), tmp.size(), 0); | ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:7376:5: note: declared here 7376 | int llama_eval( | ^~~~~~~~~~ gpttype_adapter.cpp: In function ‘generation_outputs gpttype_generate(generation_inputs, generation_outputs&)’: gpttype_adapter.cpp:1512:38: warning: ‘int llama_eval(llama_context*, llama_token*, int32_t, int)’ is deprecated: use llama_decode() instead [-Wdeprecated-declarations] 1512 | evalres = (llama_eval(llama_ctx_v4, embd.data(), embdsize, n_past)==0); | ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:7376:5: note: declared here 7376 | int llama_eval( | ^~~~~~~~~~ In file included from /usr/include/c++/12/regex:55, from model_adapter.h:6, from gpttype_adapter.cpp:12: /usr/include/c++/12/bits/stl_vector.h: In function ‘std::vector<_Tp, _Alloc>::vector(std::initializer_list<_Tp>, const allocator_type&) [with _Tp = long long int; _Alloc = std::allocator]’: /usr/include/c++/12/bits/stl_vector.h:673:7: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 673 | vector(initializer_list __l, | ^~~~~~ In file included from /usr/include/c++/12/regex:57: /usr/include/c++/12/bits/vector.tcc: In member function ‘void std::vector<_Tp, _Alloc>::_M_fill_insert(iterator, size_type, const value_type&) [with _Tp = long long int; _Alloc = std::allocator]’: /usr/include/c++/12/bits/vector.tcc:523:5: note: parameter passing for argument of type ‘std::vector::iterator’ changed in GCC 7.1 523 | vector<_Tp, _Alloc>:: | ^~~~~~~~~~~~~~~~~~~ In member function ‘void std::vector<_Tp, _Alloc>::resize(size_type, const value_type&) [with _Tp = long long int; _Alloc = std::allocator]’, inlined from ‘llama_v2_model_quantize_internal(const std::string&, const std::string&, llama_v2_ftype, int)::’ at ./otherarch/llama_v2.cpp:2147:46, inlined from ‘constexpr _Res std::__invoke_impl(__invoke_other, _Fn&&, _Args&& ...) [with _Res = void; _Fn = llama_v2_model_quantize_internal(const std::string&, const std::string&, llama_v2_ftype, int)::; _Args = {}]’ at /usr/include/c++/12/bits/invoke.h:61:36, inlined from ‘constexpr typename std::__invoke_result<_Functor, _ArgTypes>::type std::__invoke(_Callable&&, _Args&& ...) [with _Callable = llama_v2_model_quantize_internal(const std::string&, const std::string&, llama_v2_ftype, int)::; _Args = {}]’ at /usr/include/c++/12/bits/invoke.h:96:40, inlined from ‘typename std::thread::_Invoker<_Tuple>::__result<_Tuple>::type std::thread::_Invoker<_Tuple>::_M_invoke(std::_Index_tuple<_Ind ...>) [with unsigned int ..._Ind = {0}; _Tuple = std::tuple >]’ at /usr/include/c++/12/bits/std_thread.h:252:26, inlined from ‘typename std::thread::_Invoker<_Tuple>::__result<_Tuple>::type std::thread::_Invoker<_Tuple>::operator()() [with _Tuple = std::tuple >]’ at /usr/include/c++/12/bits/std_thread.h:259:20, inlined from ‘void std::thread::_State_impl<_Callable>::_M_run() [with _Callable = std::thread::_Invoker > >]’ at /usr/include/c++/12/bits/std_thread.h:210:20: /usr/include/c++/12/bits/stl_vector.h:1032:25: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator >’ changed in GCC 7.1 1032 | _M_fill_insert(end(), __new_size - size(), __x); | ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In member function ‘void std::vector<_Tp, _Alloc>::resize(size_type, const value_type&) [with _Tp = long long int; _Alloc = std::allocator]’, inlined from ‘llama_v3_model_quantize_internal(const std::string&, const std::string&, const llama_v3_model_quantize_params*)::’ at ./otherarch/llama_v3.cpp:3298:46, inlined from ‘constexpr _Res std::__invoke_impl(__invoke_other, _Fn&&, _Args&& ...) [with _Res = void; _Fn = llama_v3_model_quantize_internal(const std::string&, const std::string&, const llama_v3_model_quantize_params*)::; _Args = {}]’ at /usr/include/c++/12/bits/invoke.h:61:36, inlined from ‘constexpr typename std::__invoke_result<_Functor, _ArgTypes>::type std::__invoke(_Callable&&, _Args&& ...) [with _Callable = llama_v3_model_quantize_internal(const std::string&, const std::string&, const llama_v3_model_quantize_params*)::; _Args = {}]’ at /usr/include/c++/12/bits/invoke.h:96:40, inlined from ‘typename std::thread::_Invoker<_Tuple>::__result<_Tuple>::type std::thread::_Invoker<_Tuple>::_M_invoke(std::_Index_tuple<_Ind ...>) [with unsigned int ..._Ind = {0}; _Tuple = std::tuple >]’ at /usr/include/c++/12/bits/std_thread.h:252:26, inlined from ‘typename std::thread::_Invoker<_Tuple>::__result<_Tuple>::type std::thread::_Invoker<_Tuple>::operator()() [with _Tuple = std::tuple >]’ at /usr/include/c++/12/bits/std_thread.h:259:20, inlined from ‘void std::thread::_State_impl<_Callable>::_M_run() [with _Callable = std::thread::_Invoker > >]’ at /usr/include/c++/12/bits/std_thread.h:210:20: /usr/include/c++/12/bits/stl_vector.h:1032:25: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator >’ changed in GCC 7.1 1032 | _M_fill_insert(end(), __new_size - size(), __x); | ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp: In function ‘void llm_load_tensors(llama_model_loader&, llama_model&, int, int, const float*, bool, llama_progress_callback, void*)’: llama.cpp:2156:60: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2156 | model.tok_embeddings = ml.create_tensor(ctx, tn(LLM_TENSOR_TOKEN_EMBD, "weight"), {n_embd, n_vocab}, GGML_BACKEND_CPU); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2178:61: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2178 | model.output_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_OUTPUT_NORM, "weight"), {n_embd}, backend_norm); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2179:61: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2179 | model.output = ml.create_tensor(ctx, tn(LLM_TENSOR_OUTPUT, "weight"), {n_embd, n_vocab}, backend_output); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2201:59: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2201 | layer.attn_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_NORM, "weight", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2203:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2203 | layer.wq = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_Q, "weight", i), {n_embd, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2204:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2204 | layer.wk = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_K, "weight", i), {n_embd, n_embd_gqa}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2205:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2205 | layer.wv = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_V, "weight", i), {n_embd, n_embd_gqa}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2206:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2206 | layer.wo = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_OUT, "weight", i), {n_embd, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2208:58: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2208 | layer.ffn_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_NORM, "weight", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2210:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2210 | layer.w1 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_GATE, "weight", i), {n_embd, n_ff}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2211:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2211 | layer.w2 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_DOWN, "weight", i), { n_ff, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2212:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2212 | layer.w3 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_UP, "weight", i), {n_embd, n_ff}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2224:60: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2224 | model.tok_embeddings = ml.create_tensor(ctx, tn(LLM_TENSOR_TOKEN_EMBD, "weight"), {n_embd, n_vocab}, GGML_BACKEND_CPU); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2244:61: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2244 | model.output_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_OUTPUT_NORM, "weight"), {n_embd}, backend_norm); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2245:61: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2245 | model.output = ml.create_tensor(ctx, tn(LLM_TENSOR_OUTPUT, "weight"), {n_embd, n_vocab}, backend_output); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2267:59: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2267 | layer.attn_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_NORM, "weight", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2269:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2269 | layer.wq = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_Q, "weight", i), {n_embd, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2270:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2270 | layer.wk = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_K, "weight", i), {n_embd, n_embd_gqa}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2271:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2271 | layer.wv = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_V, "weight", i), {n_embd, n_embd_gqa}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2272:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2272 | layer.wo = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_OUT, "weight", i), {n_embd, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2274:58: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2274 | layer.ffn_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_NORM, "weight", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2276:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2276 | layer.w1 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_GATE, "weight", i), {n_embd, n_ff}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2277:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2277 | layer.w2 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_DOWN, "weight", i), { n_ff, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2278:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2278 | layer.w3 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_UP, "weight", i), {n_embd, n_ff}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2292:60: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2292 | model.tok_embeddings = ml.create_tensor(ctx, tn(LLM_TENSOR_TOKEN_EMBD, "weight"), {n_embd, n_vocab}, GGML_BACKEND_CPU); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2314:63: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2314 | model.output_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_OUTPUT_NORM, "weight"), {n_embd}, backend_norm); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2315:63: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2315 | model.output_norm_b = ml.create_tensor(ctx, tn(LLM_TENSOR_OUTPUT_NORM, "bias"), {n_embd}, backend_norm); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2316:63: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2316 | model.output = ml.create_tensor(ctx, tn(LLM_TENSOR_OUTPUT, "weight"), {n_embd, n_vocab}, backend_output); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2339:61: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2339 | layer.attn_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_NORM, "weight", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2340:61: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2340 | layer.attn_norm_b = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_NORM, "bias", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2343:67: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2343 | layer.attn_norm_2 = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_NORM_2, "weight", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2344:67: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2344 | layer.attn_norm_2_b = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_NORM_2, "bias", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2352:54: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2352 | layer.wqkv = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_QKV, "weight", i), {n_embd, n_embd + 2*n_embd_gqa}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2353:54: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2353 | layer.wo = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_OUT, "weight", i), {n_embd, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2355:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2355 | layer.w2 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_DOWN, "weight", i), { n_ff, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2356:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2356 | layer.w3 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_UP, "weight", i), {n_embd, n_ff}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2425:60: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2425 | layer.ffn_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_NORM, "weight", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2426:60: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2426 | layer.ffn_norm_b = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_NORM, "bias", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2428:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2428 | layer.w2 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_DOWN, "weight", i), {n_ff, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2429:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2429 | layer.b2 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_DOWN, "bias", i), {n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2431:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2431 | layer.w3 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_UP, "weight", i), {n_embd, n_ff}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2432:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2432 | layer.b3 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_UP, "bias", i), {n_ff}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In member function ‘void std::vector<_Tp, _Alloc>::resize(size_type, const value_type&) [with _Tp = long long int; _Alloc = std::allocator]’, inlined from ‘llama_v3_model_quantize_internal(const std::string&, const std::string&, const llama_v3_model_quantize_params*)::’ at ./otherarch/llama_v3.cpp:3298:46, inlined from ‘void llama_v3_model_quantize_internal(const std::string&, const std::string&, const llama_v3_model_quantize_params*)’ at ./otherarch/llama_v3.cpp:3309:24, inlined from ‘int llama_v3_model_quantize(const char*, const char*, const llama_v3_model_quantize_params*)’ at ./otherarch/llama_v3.cpp:3579:41: /usr/include/c++/12/bits/stl_vector.h:1032:25: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator >’ changed in GCC 7.1 1032 | _M_fill_insert(end(), __new_size - size(), __x); | ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In member function ‘void std::vector<_Tp, _Alloc>::resize(size_type, const value_type&) [with _Tp = long long int; _Alloc = std::allocator]’, inlined from ‘llama_v2_model_quantize_internal(const std::string&, const std::string&, llama_v2_ftype, int)::’ at ./otherarch/llama_v2.cpp:2147:46, inlined from ‘void llama_v2_model_quantize_internal(const std::string&, const std::string&, llama_v2_ftype, int)’ at ./otherarch/llama_v2.cpp:2158:24, inlined from ‘int llama_v2_model_quantize(const char*, const char*, llama_v2_ftype, int)’ at ./otherarch/llama_v2.cpp:2286:41: /usr/include/c++/12/bits/stl_vector.h:1032:25: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator >’ changed in GCC 7.1 1032 | _M_fill_insert(end(), __new_size - size(), __x); | ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/c++/12/bits/vector.tcc: In member function ‘void std::vector<_Tp, _Alloc>::_M_realloc_insert(iterator, _Args&& ...) [with _Args = {const double&}; _Tp = double; _Alloc = std::allocator]’: /usr/include/c++/12/bits/vector.tcc:439:7: note: parameter passing for argument of type ‘std::vector::iterator’ changed in GCC 7.1 439 | vector<_Tp, _Alloc>:: | ^~~~~~~~~~~~~~~~~~~ In member function ‘void std::vector<_Tp, _Alloc>::push_back(const value_type&) [with _Tp = double; _Alloc = std::allocator]’, inlined from ‘std::back_insert_iterator<_Container>& std::back_insert_iterator<_Container>::operator=(const typename _Container::value_type&) [with _Container = std::vector]’ at /usr/include/c++/12/bits/stl_iterator.h:735:22, inlined from ‘_OutputIterator std::partial_sum(_InputIterator, _InputIterator, _OutputIterator) [with _InputIterator = __gnu_cxx::__normal_iterator >; _OutputIterator = back_insert_iterator >]’ at /usr/include/c++/12/bits/stl_numeric.h:270:17, inlined from ‘void std::discrete_distribution<_IntType>::param_type::_M_initialize() [with _IntType = int]’ at /usr/include/c++/12/bits/random.tcc:2679:23: /usr/include/c++/12/bits/stl_vector.h:1287:28: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator >’ changed in GCC 7.1 1287 | _M_realloc_insert(end(), __x); | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~ In member function ‘void std::vector<_Tp, _Alloc>::push_back(const value_type&) [with _Tp = double; _Alloc = std::allocator]’, inlined from ‘std::back_insert_iterator<_Container>& std::back_insert_iterator<_Container>::operator=(const typename _Container::value_type&) [with _Container = std::vector]’ at /usr/include/c++/12/bits/stl_iterator.h:735:22, inlined from ‘_OutputIterator std::partial_sum(_InputIterator, _InputIterator, _OutputIterator) [with _InputIterator = __gnu_cxx::__normal_iterator >; _OutputIterator = back_insert_iterator >]’ at /usr/include/c++/12/bits/stl_numeric.h:274:16, inlined from ‘void std::discrete_distribution<_IntType>::param_type::_M_initialize() [with _IntType = int]’ at /usr/include/c++/12/bits/random.tcc:2679:23: /usr/include/c++/12/bits/stl_vector.h:1287:28: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator >’ changed in GCC 7.1 1287 | _M_realloc_insert(end(), __x); | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~ cc -I. -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11 -fPIC -DLOG_DISABLE_LOGS -D_GNU_SOURCE -pthread -s -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -c -o k_quants.o k_quants.c k_quants.c: In function ‘ggml_vec_dot_q2_K_q8_K’: k_quants.c:1355:36: warning: implicit declaration of function ‘vld1q_s16_x2’; did you mean ‘vld1q_s16’? [-Wimplicit-function-declaration] 1355 | const int16x8x2_t q8sums = vld1q_s16_x2(y[i].bsums); | ^~~~~~~~~~~~ | vld1q_s16 k_quants.c:1355:36: error: invalid initializer k_quants.c:1392:41: warning: implicit declaration of function ‘vld1q_u8_x2’; did you mean ‘vld1q_u32’? [-Wimplicit-function-declaration] 1392 | const uint8x16x2_t q2bits = vld1q_u8_x2(q2); q2 += 32; | ^~~~~~~~~~~ | vld1q_u32 k_quants.c:1392:41: error: invalid initializer k_quants.c:1394:35: warning: implicit declaration of function ‘vld1q_s8_x2’; did you mean ‘vld1q_s32’? [-Wimplicit-function-declaration] 1394 | int8x16x2_t q8bytes = vld1q_s8_x2(q8); q8 += 32; | ^~~~~~~~~~~ | vld1q_s32 k_quants.c:1394:35: error: invalid initializer k_quants.c:1384:19: error: incompatible types when assigning to type ‘int8x16x2_t’ from type ‘int’ 1384 | q8bytes = vld1q_s8_x2(q8); q8 += 32;\ | ^~~~~~~~~~~ k_quants.c:1399:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’ 1399 | SHIFT_MULTIPLY_ACCUM_WITH_SCALE(2, 2); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ k_quants.c:1384:19: error: incompatible types when assigning to type ‘int8x16x2_t’ from type ‘int’ 1384 | q8bytes = vld1q_s8_x2(q8); q8 += 32;\ | ^~~~~~~~~~~ k_quants.c:1401:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’ 1401 | SHIFT_MULTIPLY_ACCUM_WITH_SCALE(4, 4); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ k_quants.c:1384:19: error: incompatible types when assigning to type ‘int8x16x2_t’ from type ‘int’ 1384 | q8bytes = vld1q_s8_x2(q8); q8 += 32;\ | ^~~~~~~~~~~ k_quants.c:1403:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’ 1403 | SHIFT_MULTIPLY_ACCUM_WITH_SCALE(6, 6); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ k_quants.c: In function ‘ggml_vec_dot_q3_K_q8_K’: k_quants.c:1887:31: error: invalid initializer 1887 | uint8x16x2_t qhbits = vld1q_u8_x2(qh); | ^~~~~~~~~~~ k_quants.c:1905:41: error: invalid initializer 1905 | const uint8x16x2_t q3bits = vld1q_u8_x2(q3); q3 += 32; | ^~~~~~~~~~~ k_quants.c:1906:43: warning: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s64’? [-Wimplicit-function-declaration] 1906 | const int8x16x4_t q8bytes_1 = vld1q_s8_x4(q8); q8 += 64; | ^~~~~~~~~~~ | vld1q_s64 k_quants.c:1906:43: error: invalid initializer k_quants.c:1907:43: error: invalid initializer 1907 | const int8x16x4_t q8bytes_2 = vld1q_s8_x4(q8); q8 += 64; | ^~~~~~~~~~~ k_quants.c: In function ‘ggml_vec_dot_q4_K_q8_K’: k_quants.c:2634:41: error: invalid initializer 2634 | const uint8x16x2_t q4bits = vld1q_u8_x2(q4); q4 += 32; | ^~~~~~~~~~~ k_quants.c:2652:23: error: incompatible types when assigning to type ‘int8x16x2_t’ from type ‘int’ 2652 | q8bytes = vld1q_s8_x2(q8); q8 += 32; | ^~~~~~~~~~~ k_quants.c:2661:23: error: incompatible types when assigning to type ‘int8x16x2_t’ from type ‘int’ 2661 | q8bytes = vld1q_s8_x2(q8); q8 += 32; | ^~~~~~~~~~~ k_quants.c: In function ‘ggml_vec_dot_q5_K_q8_K’: k_quants.c:3163:31: error: invalid initializer 3163 | uint8x16x2_t qhbits = vld1q_u8_x2(qh); | ^~~~~~~~~~~ k_quants.c:3171:41: error: invalid initializer 3171 | const uint8x16x2_t q5bits = vld1q_u8_x2(q5); q5 += 32; | ^~~~~~~~~~~ k_quants.c:3172:41: error: invalid initializer 3172 | const int8x16x4_t q8bytes = vld1q_s8_x4(q8); q8 += 64; | ^~~~~~~~~~~ k_quants.c: In function ‘ggml_vec_dot_q6_K_q8_K’: k_quants.c:3715:36: error: invalid initializer 3715 | const int16x8x2_t q8sums = vld1q_s16_x2(y[i].bsums); | ^~~~~~~~~~~~ k_quants.c:3729:35: error: invalid initializer 3729 | uint8x16x2_t qhbits = vld1q_u8_x2(qh); qh += 32; | ^~~~~~~~~~~ k_quants.c:3730:35: warning: implicit declaration of function ‘vld1q_u8_x4’; did you mean ‘vld1q_u64’? [-Wimplicit-function-declaration] 3730 | uint8x16x4_t q6bits = vld1q_u8_x4(q6); q6 += 64; | ^~~~~~~~~~~ | vld1q_u64 k_quants.c:3730:35: error: invalid initializer k_quants.c:3731:35: error: invalid initializer 3731 | int8x16x4_t q8bytes = vld1q_s8_x4(q8); q8 += 64; | ^~~~~~~~~~~ k_quants.c:3774:23: error: incompatible types when assigning to type ‘int8x16x4_t’ from type ‘int’ 3774 | q8bytes = vld1q_s8_x4(q8); q8 += 64; | ^~~~~~~~~~~ make: *** [: k_quants.o] Error 1 ```

and here's llama.cpp for comparison:

Click to expand

``` I llama.cpp build info: I UNAME_S: Linux I UNAME_P: unknown I UNAME_M: armv7l I CFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations I CXXFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi I NVCCFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi " I LDFLAGS: I CC: cc (Debian 12.2.0-14) 12.2.0 I CXX: g++ (Debian 12.2.0-14) 12.2.0 cc -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -c ggml.c -o ggml.o g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi -c llama.cpp -o llama.o In file included from /usr/include/c++/12/vector:64, from llama.h:725, from llama.cpp:2: /usr/include/c++/12/bits/stl_vector.h: In function ‘std::vector<_Tp, _Alloc>::vector(std::initializer_list<_Tp>, const allocator_type&) [with _Tp = long long int; _Alloc = std::allocator]’: /usr/include/c++/12/bits/stl_vector.h:673:7: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 673 | vector(initializer_list __l, | ^~~~~~ llama.cpp: In function ‘void llm_load_tensors(llama_model_loader&, llama_model&, int, int, const float*, bool, llama_progress_callback, void*)’: llama.cpp:2150:60: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2150 | model.tok_embeddings = ml.create_tensor(ctx, tn(LLM_TENSOR_TOKEN_EMBD, "weight"), {n_embd, n_vocab}, GGML_BACKEND_CPU); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2172:61: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2172 | model.output_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_OUTPUT_NORM, "weight"), {n_embd}, backend_norm); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2173:61: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2173 | model.output = ml.create_tensor(ctx, tn(LLM_TENSOR_OUTPUT, "weight"), {n_embd, n_vocab}, backend_output); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2195:59: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2195 | layer.attn_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_NORM, "weight", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2197:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2197 | layer.wq = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_Q, "weight", i), {n_embd, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2198:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2198 | layer.wk = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_K, "weight", i), {n_embd, n_embd_gqa}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2199:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2199 | layer.wv = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_V, "weight", i), {n_embd, n_embd_gqa}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2200:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2200 | layer.wo = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_OUT, "weight", i), {n_embd, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2202:58: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2202 | layer.ffn_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_NORM, "weight", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2204:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2204 | layer.w1 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_GATE, "weight", i), {n_embd, n_ff}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2205:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2205 | layer.w2 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_DOWN, "weight", i), { n_ff, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2206:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2206 | layer.w3 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_UP, "weight", i), {n_embd, n_ff}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2218:60: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2218 | model.tok_embeddings = ml.create_tensor(ctx, tn(LLM_TENSOR_TOKEN_EMBD, "weight"), {n_embd, n_vocab}, GGML_BACKEND_CPU); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2238:61: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2238 | model.output_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_OUTPUT_NORM, "weight"), {n_embd}, backend_norm); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2239:61: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2239 | model.output = ml.create_tensor(ctx, tn(LLM_TENSOR_OUTPUT, "weight"), {n_embd, n_vocab}, backend_output); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2261:59: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2261 | layer.attn_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_NORM, "weight", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2263:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2263 | layer.wq = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_Q, "weight", i), {n_embd, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2264:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2264 | layer.wk = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_K, "weight", i), {n_embd, n_embd_gqa}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2265:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2265 | layer.wv = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_V, "weight", i), {n_embd, n_embd_gqa}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2266:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2266 | layer.wo = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_OUT, "weight", i), {n_embd, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2268:58: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2268 | layer.ffn_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_NORM, "weight", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2270:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2270 | layer.w1 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_GATE, "weight", i), {n_embd, n_ff}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2271:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2271 | layer.w2 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_DOWN, "weight", i), { n_ff, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2272:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2272 | layer.w3 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_UP, "weight", i), {n_embd, n_ff}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2286:60: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2286 | model.tok_embeddings = ml.create_tensor(ctx, tn(LLM_TENSOR_TOKEN_EMBD, "weight"), {n_embd, n_vocab}, GGML_BACKEND_CPU); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2308:63: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2308 | model.output_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_OUTPUT_NORM, "weight"), {n_embd}, backend_norm); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2309:63: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2309 | model.output_norm_b = ml.create_tensor(ctx, tn(LLM_TENSOR_OUTPUT_NORM, "bias"), {n_embd}, backend_norm); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2310:63: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2310 | model.output = ml.create_tensor(ctx, tn(LLM_TENSOR_OUTPUT, "weight"), {n_embd, n_vocab}, backend_output); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2333:61: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2333 | layer.attn_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_NORM, "weight", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2334:61: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2334 | layer.attn_norm_b = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_NORM, "bias", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2337:67: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2337 | layer.attn_norm_2 = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_NORM_2, "weight", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2338:67: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2338 | layer.attn_norm_2_b = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_NORM_2, "bias", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2346:54: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2346 | layer.wqkv = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_QKV, "weight", i), {n_embd, n_embd + 2*n_embd_gqa}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2347:54: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2347 | layer.wo = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_OUT, "weight", i), {n_embd, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2349:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2349 | layer.w2 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_DOWN, "weight", i), { n_ff, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2350:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2350 | layer.w3 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_UP, "weight", i), {n_embd, n_ff}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2419:60: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2419 | layer.ffn_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_NORM, "weight", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2420:60: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2420 | layer.ffn_norm_b = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_NORM, "bias", i), {n_embd}, backend); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2422:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2422 | layer.w2 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_DOWN, "weight", i), {n_ff, n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2423:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2423 | layer.b2 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_DOWN, "bias", i), {n_embd}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2425:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2425 | layer.w3 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_UP, "weight", i), {n_embd, n_ff}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llama.cpp:2426:52: note: parameter passing for argument of type ‘std::initializer_list’ changed in GCC 7.1 2426 | layer.b3 = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_UP, "bias", i), {n_ff}, backend_split); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/include/c++/12/vector:70: /usr/include/c++/12/bits/vector.tcc: In member function ‘void std::vector<_Tp, _Alloc>::_M_realloc_insert(iterator, _Args&& ...) [with _Args = {const double&}; _Tp = double; _Alloc = std::allocator]’: /usr/include/c++/12/bits/vector.tcc:439:7: note: parameter passing for argument of type ‘std::vector::iterator’ changed in GCC 7.1 439 | vector<_Tp, _Alloc>:: | ^~~~~~~~~~~~~~~~~~~ In member function ‘void std::vector<_Tp, _Alloc>::push_back(const value_type&) [with _Tp = double; _Alloc = std::allocator]’, inlined from ‘std::back_insert_iterator<_Container>& std::back_insert_iterator<_Container>::operator=(const typename _Container::value_type&) [with _Container = std::vector]’ at /usr/include/c++/12/bits/stl_iterator.h:735:22, inlined from ‘_OutputIterator std::partial_sum(_InputIterator, _InputIterator, _OutputIterator) [with _InputIterator = __gnu_cxx::__normal_iterator >; _OutputIterator = back_insert_iterator >]’ at /usr/include/c++/12/bits/stl_numeric.h:270:17, inlined from ‘void std::discrete_distribution<_IntType>::param_type::_M_initialize() [with _IntType = int]’ at /usr/include/c++/12/bits/random.tcc:2679:23: /usr/include/c++/12/bits/stl_vector.h:1287:28: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator >’ changed in GCC 7.1 1287 | _M_realloc_insert(end(), __x); | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~ In member function ‘void std::vector<_Tp, _Alloc>::push_back(const value_type&) [with _Tp = double; _Alloc = std::allocator]’, inlined from ‘std::back_insert_iterator<_Container>& std::back_insert_iterator<_Container>::operator=(const typename _Container::value_type&) [with _Container = std::vector]’ at /usr/include/c++/12/bits/stl_iterator.h:735:22, inlined from ‘_OutputIterator std::partial_sum(_InputIterator, _InputIterator, _OutputIterator) [with _InputIterator = __gnu_cxx::__normal_iterator >; _OutputIterator = back_insert_iterator >]’ at /usr/include/c++/12/bits/stl_numeric.h:274:16, inlined from ‘void std::discrete_distribution<_IntType>::param_type::_M_initialize() [with _IntType = int]’ at /usr/include/c++/12/bits/random.tcc:2679:23: /usr/include/c++/12/bits/stl_vector.h:1287:28: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator >’ changed in GCC 7.1 1287 | _M_realloc_insert(end(), __x); | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~ g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi -c common/common.cpp -o common.o g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi -c common/console.cpp -o console.o g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi -c common/grammar-parser.cpp -o grammar-parser.o cc -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -c ggml-alloc.c -o ggml-alloc.o g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/main/main.cpp ggml.o llama.o common.o console.o grammar-parser.o ggml-alloc.o -o main ==== Run ./main -h for help. ==== g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/quantize/quantize.cpp ggml.o llama.o ggml-alloc.o -o quantize g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/quantize-stats/quantize-stats.cpp ggml.o llama.o ggml-alloc.o -o quantize-stats g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/perplexity/perplexity.cpp ggml.o llama.o common.o ggml-alloc.o -o perplexity g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/embedding/embedding.cpp ggml.o llama.o common.o ggml-alloc.o -o embedding g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi pocs/vdot/vdot.cpp ggml.o ggml-alloc.o -o vdot g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi pocs/vdot/q8dot.cpp ggml.o ggml-alloc.o -o q8dot g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi -c common/train.cpp -o train.o g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/train-text-from-scratch/train-text-from-scratch.cpp ggml.o llama.o common.o train.o ggml-alloc.o -o train-text-from-scratch g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp ggml.o llama.o ggml-alloc.o -o convert-llama2c-to-ggml g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/simple/simple.cpp ggml.o llama.o common.o ggml-alloc.o -o simple g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/batched/batched.cpp ggml.o llama.o common.o ggml-alloc.o -o batched g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/save-load-state/save-load-state.cpp ggml.o llama.o common.o ggml-alloc.o -o save-load-state g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iexamples/server examples/server/server.cpp ggml.o llama.o common.o grammar-parser.o ggml-alloc.o -o server In file included from /usr/include/c++/12/map:60, from common/grammar-parser.h:15, from examples/server/server.cpp:4: /usr/include/c++/12/bits/stl_tree.h: In function ‘std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::iterator std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_emplace_hint_unique(const_iterator, _Args&& ...) [with _Args = {const std::piecewise_construct_t&, std::tuple, std::allocator >&>, std::tuple<>}; _Key = std::__cxx11::basic_string; _Val = std::pair, nlohmann::json_abi_v3_11_2::basic_json<> >; _KeyOfValue = std::_Select1st, nlohmann::json_abi_v3_11_2::basic_json<> > >; _Compare = std::less >; _Alloc = std::allocator, nlohmann::json_abi_v3_11_2::basic_json<> > >]’: /usr/include/c++/12/bits/stl_tree.h:2457:7: note: parameter passing for argument of type ‘std::_Rb_tree, std::pair, nlohmann::json_abi_v3_11_2::basic_json<> >, std::_Select1st, nlohmann::json_abi_v3_11_2::basic_json<> > >, std::less >, std::allocator, nlohmann::json_abi_v3_11_2::basic_json<> > > >::const_iterator’ changed in GCC 7.1 2457 | _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>:: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/include/c++/12/map:61: In member function ‘std::map<_Key, _Tp, _Compare, _Alloc>::mapped_type& std::map<_Key, _Tp, _Compare, _Alloc>::operator[](const key_type&) [with _Key = std::__cxx11::basic_string; _Tp = nlohmann::json_abi_v3_11_2::basic_json<>; _Compare = std::less >; _Alloc = std::allocator, nlohmann::json_abi_v3_11_2::basic_json<> > >]’, inlined from ‘bool nlohmann::json_abi_v3_11_2::detail::json_sax_dom_callback_parser::key(string_t&) [with BasicJsonType = nlohmann::json_abi_v3_11_2::basic_json<>]’ at examples/server/json.hpp:7010:76: /usr/include/c++/12/bits/stl_map.h:511:44: note: parameter passing for argument of type ‘std::_Rb_tree, std::pair, nlohmann::json_abi_v3_11_2::basic_json<> >, std::_Select1st, nlohmann::json_abi_v3_11_2::basic_json<> > >, std::less >, std::allocator, nlohmann::json_abi_v3_11_2::basic_json<> > > >::const_iterator’ changed in GCC 7.1 511 | __i = _M_t._M_emplace_hint_unique(__i, std::piecewise_construct, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 512 | std::tuple(__k), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 513 | std::tuple<>()); | ~~~~~~~~~~~~~~~ /usr/include/c++/12/bits/stl_map.h: In member function ‘std::map<_Key, _Tp, _Compare, _Alloc>::mapped_type& std::map<_Key, _Tp, _Compare, _Alloc>::operator[](const key_type&) [with _Key = std::__cxx11::basic_string; _Tp = nlohmann::json_abi_v3_11_2::basic_json<>; _Compare = std::less >; _Alloc = std::allocator, nlohmann::json_abi_v3_11_2::basic_json<> > >]’: /usr/include/c++/12/bits/stl_map.h:511:44: note: parameter passing for argument of type ‘std::_Rb_tree, std::pair, nlohmann::json_abi_v3_11_2::basic_json<> >, std::_Select1st, nlohmann::json_abi_v3_11_2::basic_json<> > >, std::less >, std::allocator, nlohmann::json_abi_v3_11_2::basic_json<> > > >::const_iterator’ changed in GCC 7.1 511 | __i = _M_t._M_emplace_hint_unique(__i, std::piecewise_construct, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 512 | std::tuple(__k), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 513 | std::tuple<>()); | ~~~~~~~~~~~~~~~ In member function ‘std::map<_Key, _Tp, _Compare, _Alloc>::mapped_type& std::map<_Key, _Tp, _Compare, _Alloc>::operator[](const key_type&) [with _Key = std::__cxx11::basic_string; _Tp = nlohmann::json_abi_v3_11_2::basic_json<>; _Compare = std::less >; _Alloc = std::allocator, nlohmann::json_abi_v3_11_2::basic_json<> > >]’, inlined from ‘bool nlohmann::json_abi_v3_11_2::detail::json_sax_dom_parser::key(string_t&) [with BasicJsonType = nlohmann::json_abi_v3_11_2::basic_json<>]’ at examples/server/json.hpp:6815:72, inlined from ‘bool nlohmann::json_abi_v3_11_2::detail::parser::sax_parse_internal(SAX*) [with SAX = nlohmann::json_abi_v3_11_2::detail::json_sax_dom_parser >; BasicJsonType = nlohmann::json_abi_v3_11_2::basic_json<>; InputAdapterType = nlohmann::json_abi_v3_11_2::detail::iterator_input_adapter<__gnu_cxx::__normal_iterator > >]’ at examples/server/json.hpp:12516:21: /usr/include/c++/12/bits/stl_map.h:511:44: note: parameter passing for argument of type ‘std::_Rb_tree, std::pair, nlohmann::json_abi_v3_11_2::basic_json<> >, std::_Select1st, nlohmann::json_abi_v3_11_2::basic_json<> > >, std::less >, std::allocator, nlohmann::json_abi_v3_11_2::basic_json<> > > >::const_iterator’ changed in GCC 7.1 511 | __i = _M_t._M_emplace_hint_unique(__i, std::piecewise_construct, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 512 | std::tuple(__k), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 513 | std::tuple<>()); | ~~~~~~~~~~~~~~~ g++ --shared -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/embd-input/embd-input-lib.cpp ggml.o llama.o common.o ggml-alloc.o -o libembdinput.so g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/embd-input/embd-input-test.cpp ggml.o llama.o common.o ggml-alloc.o -o embd-input-test -L. -lembdinput g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/gguf/gguf.cpp ggml.o llama.o ggml-alloc.o -o gguf g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/llama-bench/llama-bench.cpp ggml.o llama.o common.o ggml-alloc.o -o llama-bench In file included from /usr/include/c++/12/regex:57, from examples/llama-bench/llama-bench.cpp:14: /usr/include/c++/12/bits/vector.tcc: In member function ‘void std::vector<_Tp, _Alloc>::_M_realloc_insert(iterator, _Args&& ...) [with _Args = {const long long unsigned int&}; _Tp = long long unsigned int; _Alloc = std::allocator]’: /usr/include/c++/12/bits/vector.tcc:439:7: note: parameter passing for argument of type ‘std::vector::iterator’ changed in GCC 7.1 439 | vector<_Tp, _Alloc>:: | ^~~~~~~~~~~~~~~~~~~ /usr/include/c++/12/bits/vector.tcc: In member function ‘void std::vector<_Tp, _Alloc>::_M_realloc_insert(iterator, _Args&& ...) [with _Args = {double}; _Tp = double; _Alloc = std::allocator]’: /usr/include/c++/12/bits/vector.tcc:439:7: note: parameter passing for argument of type ‘std::vector::iterator’ changed in GCC 7.1 In member function ‘void std::vector<_Tp, _Alloc>::emplace_back(_Args&& ...) [with _Args = {double}; _Tp = double; _Alloc = std::allocator]’, inlined from ‘void std::vector<_Tp, _Alloc>::push_back(value_type&&) [with _Tp = double; _Alloc = std::allocator]’ at /usr/include/c++/12/bits/stl_vector.h:1294:21, inlined from ‘std::back_insert_iterator<_Container>& std::back_insert_iterator<_Container>::operator=(typename _Container::value_type&&) [with _Container = std::vector]’ at /usr/include/c++/12/bits/stl_iterator.h:743:22, inlined from ‘_OIter std::transform(_IIter, _IIter, _OIter, _UnaryOperation) [with _IIter = __gnu_cxx::__normal_iterator >; _OIter = back_insert_iterator >; _UnaryOperation = test::get_ts() const::]’ at /usr/include/c++/12/bits/stl_algo.h:4263:12, inlined from ‘std::vector test::get_ts() const’ at examples/llama-bench/llama-bench.cpp:538:23, inlined from ‘double test::avg_ts() const’ at examples/llama-bench/llama-bench.cpp:543:28, inlined from ‘std::vector > test::get_values() const’ at examples/llama-bench/llama-bench.cpp:632:5: /usr/include/c++/12/bits/vector.tcc:123:28: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator >’ changed in GCC 7.1 123 | _M_realloc_insert(end(), std::forward<_Args>(__args)...); | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In member function ‘void std::vector<_Tp, _Alloc>::emplace_back(_Args&& ...) [with _Args = {double}; _Tp = double; _Alloc = std::allocator]’, inlined from ‘void std::vector<_Tp, _Alloc>::push_back(value_type&&) [with _Tp = double; _Alloc = std::allocator]’ at /usr/include/c++/12/bits/stl_vector.h:1294:21, inlined from ‘std::back_insert_iterator<_Container>& std::back_insert_iterator<_Container>::operator=(typename _Container::value_type&&) [with _Container = std::vector]’ at /usr/include/c++/12/bits/stl_iterator.h:743:22, inlined from ‘_OIter std::transform(_IIter, _IIter, _OIter, _UnaryOperation) [with _IIter = __gnu_cxx::__normal_iterator >; _OIter = back_insert_iterator >; _UnaryOperation = test::get_ts() const::]’ at /usr/include/c++/12/bits/stl_algo.h:4263:12, inlined from ‘std::vector test::get_ts() const’ at examples/llama-bench/llama-bench.cpp:538:23, inlined from ‘double test::stdev_ts() const’ at examples/llama-bench/llama-bench.cpp:547:30, inlined from ‘std::vector > test::get_values() const’ at examples/llama-bench/llama-bench.cpp:632:5: /usr/include/c++/12/bits/vector.tcc:123:28: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator >’ changed in GCC 7.1 123 | _M_realloc_insert(end(), std::forward<_Args>(__args)...); | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In member function ‘void std::vector<_Tp, _Alloc>::emplace_back(_Args&& ...) [with _Args = {double}; _Tp = double; _Alloc = std::allocator]’, inlined from ‘void std::vector<_Tp, _Alloc>::push_back(value_type&&) [with _Tp = double; _Alloc = std::allocator]’ at /usr/include/c++/12/bits/stl_vector.h:1294:21, inlined from ‘std::back_insert_iterator<_Container>& std::back_insert_iterator<_Container>::operator=(typename _Container::value_type&&) [with _Container = std::vector]’ at /usr/include/c++/12/bits/stl_iterator.h:743:22, inlined from ‘_OIter std::transform(_IIter, _IIter, _OIter, _UnaryOperation) [with _IIter = __gnu_cxx::__normal_iterator >; _OIter = back_insert_iterator >; _UnaryOperation = test::get_ts() const::]’ at /usr/include/c++/12/bits/stl_algo.h:4263:12, inlined from ‘std::vector test::get_ts() const’ at examples/llama-bench/llama-bench.cpp:538:23, inlined from ‘virtual void json_printer::print_test(const test&)’ at examples/llama-bench/llama-bench.cpp:742:68: /usr/include/c++/12/bits/vector.tcc:123:28: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator >’ changed in GCC 7.1 123 | _M_realloc_insert(end(), std::forward<_Args>(__args)...); | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In member function ‘void std::vector<_Tp, _Alloc>::emplace_back(_Args&& ...) [with _Args = {double}; _Tp = double; _Alloc = std::allocator]’, inlined from ‘void std::vector<_Tp, _Alloc>::push_back(value_type&&) [with _Tp = double; _Alloc = std::allocator]’ at /usr/include/c++/12/bits/stl_vector.h:1294:21, inlined from ‘std::back_insert_iterator<_Container>& std::back_insert_iterator<_Container>::operator=(typename _Container::value_type&&) [with _Container = std::vector]’ at /usr/include/c++/12/bits/stl_iterator.h:743:22, inlined from ‘_OIter std::transform(_IIter, _IIter, _OIter, _UnaryOperation) [with _IIter = __gnu_cxx::__normal_iterator >; _OIter = back_insert_iterator >; _UnaryOperation = test::get_ts() const::]’ at /usr/include/c++/12/bits/stl_algo.h:4263:12, inlined from ‘std::vector test::get_ts() const’ at examples/llama-bench/llama-bench.cpp:538:23, inlined from ‘double test::avg_ts() const’ at examples/llama-bench/llama-bench.cpp:543:28, inlined from ‘virtual void markdown_printer::print_test(const test&)’ at examples/llama-bench/llama-bench.cpp:873:25: /usr/include/c++/12/bits/vector.tcc:123:28: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator >’ changed in GCC 7.1 123 | _M_realloc_insert(end(), std::forward<_Args>(__args)...); | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In member function ‘void std::vector<_Tp, _Alloc>::emplace_back(_Args&& ...) [with _Args = {double}; _Tp = double; _Alloc = std::allocator]’, inlined from ‘void std::vector<_Tp, _Alloc>::push_back(value_type&&) [with _Tp = double; _Alloc = std::allocator]’ at /usr/include/c++/12/bits/stl_vector.h:1294:21, inlined from ‘std::back_insert_iterator<_Container>& std::back_insert_iterator<_Container>::operator=(typename _Container::value_type&&) [with _Container = std::vector]’ at /usr/include/c++/12/bits/stl_iterator.h:743:22, inlined from ‘_OIter std::transform(_IIter, _IIter, _OIter, _UnaryOperation) [with _IIter = __gnu_cxx::__normal_iterator >; _OIter = back_insert_iterator >; _UnaryOperation = test::get_ts() const::]’ at /usr/include/c++/12/bits/stl_algo.h:4263:12, inlined from ‘std::vector test::get_ts() const’ at examples/llama-bench/llama-bench.cpp:538:23, inlined from ‘double test::stdev_ts() const’ at examples/llama-bench/llama-bench.cpp:547:30, inlined from ‘virtual void markdown_printer::print_test(const test&)’ at examples/llama-bench/llama-bench.cpp:873:25: /usr/include/c++/12/bits/vector.tcc:123:28: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator >’ changed in GCC 7.1 123 | _M_realloc_insert(end(), std::forward<_Args>(__args)...); | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/include/c++/12/regex:55: In member function ‘void std::vector<_Tp, _Alloc>::push_back(const value_type&) [with _Tp = long long unsigned int; _Alloc = std::allocator]’, inlined from ‘int main(int, char**)’ at examples/llama-bench/llama-bench.cpp:1061:35: /usr/include/c++/12/bits/stl_vector.h:1287:28: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator >’ changed in GCC 7.1 1287 | _M_realloc_insert(end(), __x); | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~ g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/baby-llama/baby-llama.cpp ggml.o llama.o common.o train.o ggml-alloc.o -o baby-llama g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/beam-search/beam-search.cpp ggml.o llama.o common.o ggml-alloc.o -o beam-search g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/speculative/speculative.cpp ggml.o llama.o common.o grammar-parser.o ggml-alloc.o -o speculative g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/benchmark/benchmark-matmult.cpp ggml.o ggml-alloc.o -o benchmark-matmult g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/parallel/parallel.cpp ggml.o llama.o common.o ggml-alloc.o -o parallel g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/finetune/finetune.cpp ggml.o llama.o common.o train.o ggml-alloc.o -o finetune g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -Wno-array-bounds -Wno-format-truncation -Wextra-semi examples/export-lora/export-lora.cpp ggml.o llama.o common.o ggml-alloc.o -o export-lora cc -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations -c tests/test-c.c -o tests/test-c.o ```

LostRuins commented 11 months ago

sorry about that. can you pull my experimental branch and try again?

Crataco commented 11 months ago

The recent Makefile commit seems to have fixed it!

Build results

![image](https://github.com/LostRuins/koboldcpp/assets/55674863/ac1b22ba-5b3c-4749-bf82-2aabdfe52f4d)

Test results and thoughts

I tried out several models under 3GB of RAM. I made sure to disable as many background apps as I could: - **[AI Dungeon 2 Classic](https://huggingface.co/Crataco/AI-Dungeon-2-Classic-GGML) q4_0:** started, but gave an OOM error on the first message and crashed on the second message. - **[Pygmalion 1.3B](https://huggingface.co/Crataco/Pygmalion-1.3B-GGML) q4_0:** worked well, though outputs were strange. I blame the quantization, as q5_1 felt better when I ran it on my PC. - **[RWKV-4 World](https://huggingface.co/Crataco/RWKV-4-World-Series-GGML) q4_0:** worked at 2.4 - 2.6t/s, processed the prompt at about 3.2t/s and reprocessed the whole chat history every message. - **[RWKV-4 World](https://huggingface.co/Crataco/RWKV-4-World-Series-GGML) q5_1:** worked at 1.3t/s, processed the prompt at 1.4 t/s. As expected of higher quants, I feel like the results were slightly better. - **[TinyLLaMA 1.1B Chat v0.2](https://huggingface.co/kirp/TinyLlama-1.1B-Chat-v0.2-gguf) q5_0:** Crashes with a "floating point exception" error, but this might be related to upstream llama.cpp requiring a patch to run TinyLLaMA. **Notes:** - Compiling with OpenBLAS gave me a *"cannot enable executable stack as shared object requires: permission denied"* error once starting KoboldCpp. CLBlast depends on the GPU, but GPU acceleration isn't so straightforward in Termux. So I'll stick with the slow processing times. I don't have the energy to troubleshoot this further lol - While KoboldCpp (RWKV-4-World q5_1) was running, I created a separate proot-distro container for SillyTavern using Alpine. But Termux crashes when I go to my home screen, so I probably don't have enough RAM to juggle both. ***

Thank you so much for working on this issue. The k-quant issue is now resolved, and I'm happy to run LLMs on my 32-bit ARM Androids (it's honestly a dream come true).

I'll close this issue unless I can't compile without k-quants sometime in the future.

LostRuins commented 11 months ago

Glad you got it working.