Closed yassinebennani closed 1 month ago
From your code I can see you're using b3902:
var native_directory = Path.Combine(directory, "llama-b3902-bin-win-llvm-arm64");
This is the wrong version of llama.cpp.
llama.cpp doesn't offer a stable API, there is no compatibility from version to version. If you're compiling your own binaries you must use exactly the correct version of llama.cpp with LLamaSharp.
The versions are documented in the table at the bottom of the readme.
Hello,
Thank you very much for you answer, I'm using now the correct one b3616 and it's working.
Description
Hello Guys, I'm trying to run a sample like in the documentation without success, I'm sharing with you the output of my console application, I can see in the logs that all the model parameters of my code are ignored, can you pleaaase help ?
Debug: Loading library: 'llama' Info: Detected OS Platform: 'WINDOWS' Debug: Detected OS string: 'win-x64'
Debug: Detected extension string: '.dll' Debug: Detected prefix string: '' Info: NativeLibraryConfig Description:
SearchDirectories and Priorities: { ./ } Debug: Got relative library path 'C:\Users\yassinebennani\source\repos\GptLike\GptLike\llama-b3902-bin-win-llvm-arm64\llama.dll' from local with , trying to load it... Debug: Found full path file 'C:\Users\yassinebennani\source\repos\GptLike\GptLike\llama-b3902-bin-win-llvm-arm64\llama.dll' for relative path 'C:\Users\yassinebennani\source\repos\GptLike\GptLike\llama-b3902-bin-win-llvm-arm64\llama.dll' Info: Successfully loaded 'C:\Users\yassinebennani\source\repos\GptLike\GptLike\llama-b3902-bin-win-llvm-arm64\llama.dll' 1: llama_model_loader: loaded meta data with 20 key-value pairs and 325 tensors from C:\Users\yassinebennani\source\repos\GptLike\GptLike\phi-2.Q8_0.gguf (version GGUF V3 (latest)) 1: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. 1: llama_model_loader: - kv 0: general.architecture str = phi2 1: llama_model_loader: - kv 1: general.name str = Phi2 1: llama_model_loader: - kv 2: phi2.context_length u32 = 2048 1: llama_model_loader: - kv 3: phi2.embedding_length u32 = 2560 1: llama_model_loader: - kv 4: phi2.feed_forward_length u32 = 10240 1: llama_model_loader: - kv 5: phi2.block_count u32 = 32 1: llama_model_loader: - kv 6: phi2.attention.head_count u32 = 32 1: llama_model_loader: - kv 7: phi2.attention.head_count_kv u32 = 32 1: llama_model_loader: - kv 8: phi2.attention.layer_norm_epsilon f32 = 0.000010 1: llama_model_loader: - kv 9: phi2.rope.dimension_count u32 = 32 1: llama_model_loader: - kv 10: general.file_type u32 = 7 1: llama_model_loader: - kv 11: tokenizer.ggml.add_bos_token bool = false 1: llama_model_loader: - kv 12: tokenizer.ggml.model str = gpt2 1: llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,51200] = ["!", "\"", "#", "$", "%", "&", "'", ... 1: llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,51200] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... 1: llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,50000] = ["Ä t", "Ä a", "h e", "i n", "r e",... 1: llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 50256 1: llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 50256 1: llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 50256 1: llama_model_loader: - kv 19: general.quantization_version u32 = 2 1: llama_model_loader: - type f32: 195 tensors 1: llama_model_loader: - type q8_0: 130 tensors Error: llm_load_vocab: missing pre-tokenizer type, using: 'default' Error: llm_load_vocab: Error: llm_load_vocab: **** Error: llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED! Error: llm_load_vocab: CONSIDER REGENERATING THE MODEL Error: llm_load_vocab: **** Error: llm_load_vocab: 1: llm_load_vocab: special tokens cache size = 944 1: llm_load_vocab: token to piece cache size = 0.3151 MB 1: llm_load_print_meta: format = GGUF V3 (latest) 1: llm_load_print_meta: arch = phi2 1: llm_load_print_meta: vocab type = BPE 1: llm_load_print_meta: n_vocab = 51200 1: llm_load_print_meta: n_merges = 50000 1: llm_load_print_meta: vocab_only = 0 1: llm_load_print_meta: n_ctx_train = 2048 1: llm_load_print_meta: n_embd = 2560 1: llm_load_print_meta: n_layer = 32 1: llm_load_print_meta: n_head = 32 1: llm_load_print_meta: n_head_kv = 32 1: llm_load_print_meta: n_rot = 32 1: llm_load_print_meta: n_swa = 0 1: llm_load_print_meta: n_embd_head_k = 80 1: llm_load_print_meta: n_embd_head_v = 80 1: llm_load_print_meta: n_gqa = 1 1: llm_load_print_meta: n_embd_k_gqa = 2560 1: llm_load_print_meta: n_embd_v_gqa = 2560 1: llm_load_print_meta: f_norm_eps = 1.0e-05 1: llm_load_print_meta: f_norm_rms_eps = 0.0e+00 1: llm_load_print_meta: f_clamp_kqv = 0.0e+00 1: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 1: llm_load_print_meta: f_logit_scale = 0.0e+00 1: llm_load_print_meta: n_ff = 10240 1: llm_load_print_meta: n_expert = 0 1: llm_load_print_meta: n_expert_used = 0 1: llm_load_print_meta: causal attn = 1 1: llm_load_print_meta: pooling type = 0 1: llm_load_print_meta: rope type = 2 1: llm_load_print_meta: rope scaling = linear 1: llm_load_print_meta: freq_base_train = 10000.0 1: llm_load_print_meta: freq_scale_train = 1 1: llm_load_print_meta: n_ctx_orig_yarn = 2048 1: llm_load_print_meta: rope_finetuned = unknown 1: llm_load_print_meta: ssm_d_conv = 0 1: llm_load_print_meta: ssm_d_inner = 0 1: llm_load_print_meta: ssm_d_state = 0 1: llm_load_print_meta: ssm_dt_rank = 0 1: llm_load_print_meta: ssm_dt_b_c_rms = 0 1: llm_load_print_meta: model type = 3B 1: llm_load_print_meta: model ftype = Q8_0 1: llm_load_print_meta: model params = 2.78 B 1: llm_load_print_meta: model size = 2.75 GiB (8.51 BPW) 1: llm_load_print_meta: general.name = Phi2 1: llm_load_print_meta: BOS token = 50256 '<|endoftext|>' 1: llm_load_print_meta: EOS token = 50256 '<|endoftext|>' 1: llm_load_print_meta: UNK token = 50256 '<|endoftext|>' 1: llm_load_print_meta: LF token = 128 'A,' 1: llm_load_print_meta: EOT token = 50256 '<|endoftext|>' 1: llm_load_print_meta: EOG token = 50256 '<|endoftext|>' 1: llm_load_print_meta: max token length = 256 1: llm_load_tensors: ggml ctx size = 0.15 MiB 1: llm_loadtensors: CPU buffer size = 2819.28 MiB Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: Error: llama_new_context_with_model: n_batch is less than GGML_KQ_MASK_PAD - increasing to 32 1: llama_new_context_with_model: n_ctx = 0 1: llama_new_context_with_model: n_batch = 32 1: llama_new_context_with_model: n_ubatch = 32 1: llama_new_context_with_model: flash_attn = 0 1: llama_new_context_with_model: freq_base = -nan 1: llama_new_context_with_model: freq_scale = 1_ Warning: llama_kv_cache_init: failed to allocate buffer for kv cache Warning: llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. Repeat 2 times:
at LLama.Native.SafeLLamaContextHandle.llama_n_ctx(LLama.Native.SafeLLamaContextHandle)
at LLama.Native.SafeLLamaContextHandle.get_ContextSize() at LLama.LLamaContext.get_ContextSize() at LLama.StatefulExecutorBase..ctor(LLama.LLamaContext, Microsoft.Extensions.Logging.ILogger) at LLama.InteractiveExecutor..ctor(LLama.LLamaContext, Microsoft.Extensions.Logging.ILogger) at GptLike.Program+d0.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.__Canon ByRef)
at GptLike.Program.Main(System.String[])
at GptLike.Program.(System.String[])
C:\Users\yassinebennani\source\repos\GptLike\GptLike\bin\Debug\net7.0\GptLike.exe (process 19268) exited with code -1073741819 (0xc0000005). Press any key to close this window . . .
Reproduction Steps
Environment & Configuration
Known Workarounds
No response