[BUG]: Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

Description

Hello Guys, I'm trying to run a sample like in the documentation without success, I'm sharing with you the output of my console application, I can see in the logs that all the model parameters of my code are ignored, can you pleaaase help ?

Debug: Loading library: 'llama' Info: Detected OS Platform: 'WINDOWS' Debug: Detected OS string: 'win-x64'
Debug: Detected extension string: '.dll' Debug: Detected prefix string: '' Info: NativeLibraryConfig Description:

LibraryName: LLama
Path: 'C:\Users\yassinebennani\source\repos\GptLike\GptLike\llama-b3902-bin-win-llvm-arm64\llama.dll'
PreferCuda: True
PreferVulkan: True
PreferredAvxLevel: NoAVX
AllowFallback: True
SkipCheck: False
SearchDirectories and Priorities: { ./ } Info: NativeLibraryConfig Description:
LibraryName: LLama
Path: 'C:\Users\yassinebennani\source\repos\GptLike\GptLike\llama-b3902-bin-win-llvm-arm64\llama.dll'
PreferCuda: True
PreferVulkan: True
PreferredAvxLevel: NoAVX
AllowFallback: True
SkipCheck: False
SearchDirectories and Priorities: { ./ } Debug: Got relative library path 'C:\Users\yassinebennani\source\repos\GptLike\GptLike\llama-b3902-bin-win-llvm-arm64\llama.dll' from local with , trying to load it... Debug: Found full path file 'C:\Users\yassinebennani\source\repos\GptLike\GptLike\llama-b3902-bin-win-llvm-arm64\llama.dll' for relative path 'C:\Users\yassinebennani\source\repos\GptLike\GptLike\llama-b3902-bin-win-llvm-arm64\llama.dll' Info: Successfully loaded 'C:\Users\yassinebennani\source\repos\GptLike\GptLike\llama-b3902-bin-win-llvm-arm64\llama.dll' 1: llama_model_loader: loaded meta data with 20 key-value pairs and 325 tensors from C:\Users\yassinebennani\source\repos\GptLike\GptLike\phi-2.Q8_0.gguf (version GGUF V3 (latest)) 1: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. 1: llama_model_loader: - kv 0: general.architecture str = phi2 1: llama_model_loader: - kv 1: general.name str = Phi2 1: llama_model_loader: - kv 2: phi2.context_length u32 = 2048 1: llama_model_loader: - kv 3: phi2.embedding_length u32 = 2560 1: llama_model_loader: - kv 4: phi2.feed_forward_length u32 = 10240 1: llama_model_loader: - kv 5: phi2.block_count u32 = 32 1: llama_model_loader: - kv 6: phi2.attention.head_count u32 = 32 1: llama_model_loader: - kv 7: phi2.attention.head_count_kv u32 = 32 1: llama_model_loader: - kv 8: phi2.attention.layer_norm_epsilon f32 = 0.000010 1: llama_model_loader: - kv 9: phi2.rope.dimension_count u32 = 32 1: llama_model_loader: - kv 10: general.file_type u32 = 7 1: llama_model_loader: - kv 11: tokenizer.ggml.add_bos_token bool = false 1: llama_model_loader: - kv 12: tokenizer.ggml.model str = gpt2 1: llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,51200] = ["!", "\"", "#", "$", "%", "&", "'", ... 1: llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,51200] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... 1: llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,50000] = ["Ä t", "Ä a", "h e", "i n", "r e",... 1: llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 50256 1: llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 50256 1: llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 50256 1: llama_model_loader: - kv 19: general.quantization_version u32 = 2 1: llama_model_loader: - type f32: 195 tensors 1: llama_model_loader: - type q8_0: 130 tensors Error: llm_load_vocab: missing pre-tokenizer type, using: 'default' Error: llm_load_vocab: Error: llm_load_vocab: **** Error: llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED! Error: llm_load_vocab: CONSIDER REGENERATING THE MODEL Error: llm_load_vocab: **** Error: llm_load_vocab: 1: llm_load_vocab: special tokens cache size = 944 1: llm_load_vocab: token to piece cache size = 0.3151 MB 1: llm_load_print_meta: format = GGUF V3 (latest) 1: llm_load_print_meta: arch = phi2 1: llm_load_print_meta: vocab type = BPE 1: llm_load_print_meta: n_vocab = 51200 1: llm_load_print_meta: n_merges = 50000 1: llm_load_print_meta: vocab_only = 0 1: llm_load_print_meta: n_ctx_train = 2048 1: llm_load_print_meta: n_embd = 2560 1: llm_load_print_meta: n_layer = 32 1: llm_load_print_meta: n_head = 32 1: llm_load_print_meta: n_head_kv = 32 1: llm_load_print_meta: n_rot = 32 1: llm_load_print_meta: n_swa = 0 1: llm_load_print_meta: n_embd_head_k = 80 1: llm_load_print_meta: n_embd_head_v = 80 1: llm_load_print_meta: n_gqa = 1 1: llm_load_print_meta: n_embd_k_gqa = 2560 1: llm_load_print_meta: n_embd_v_gqa = 2560 1: llm_load_print_meta: f_norm_eps = 1.0e-05 1: llm_load_print_meta: f_norm_rms_eps = 0.0e+00 1: llm_load_print_meta: f_clamp_kqv = 0.0e+00 1: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 1: llm_load_print_meta: f_logit_scale = 0.0e+00 1: llm_load_print_meta: n_ff = 10240 1: llm_load_print_meta: n_expert = 0 1: llm_load_print_meta: n_expert_used = 0 1: llm_load_print_meta: causal attn = 1 1: llm_load_print_meta: pooling type = 0 1: llm_load_print_meta: rope type = 2 1: llm_load_print_meta: rope scaling = linear 1: llm_load_print_meta: freq_base_train = 10000.0 1: llm_load_print_meta: freq_scale_train = 1 1: llm_load_print_meta: n_ctx_orig_yarn = 2048 1: llm_load_print_meta: rope_finetuned = unknown 1: llm_load_print_meta: ssm_d_conv = 0 1: llm_load_print_meta: ssm_d_inner = 0 1: llm_load_print_meta: ssm_d_state = 0 1: llm_load_print_meta: ssm_dt_rank = 0 1: llm_load_print_meta: ssm_dt_b_c_rms = 0 1: llm_load_print_meta: model type = 3B 1: llm_load_print_meta: model ftype = Q8_0 1: llm_load_print_meta: model params = 2.78 B 1: llm_load_print_meta: model size = 2.75 GiB (8.51 BPW) 1: llm_load_print_meta: general.name = Phi2 1: llm_load_print_meta: BOS token = 50256 '<|endoftext|>' 1: llm_load_print_meta: EOS token = 50256 '<|endoftext|>' 1: llm_load_print_meta: UNK token = 50256 '<|endoftext|>' 1: llm_load_print_meta: LF token = 128 'A,' 1: llm_load_print_meta: EOT token = 50256 '<|endoftext|>' 1: llm_load_print_meta: EOG token = 50256 '<|endoftext|>' 1: llm_load_print_meta: max token length = 256 1: llm_load_tensors: ggml ctx size = 0.15 MiB 1: llm_loadtensors: CPU buffer size = 2819.28 MiB Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: .Debug: Error: llama_new_context_with_model: n_batch is less than GGML_KQ_MASK_PAD - increasing to 32 1: llama_new_context_with_model: n_ctx = 0 1: llama_new_context_with_model: n_batch = 32 1: llama_new_context_with_model: n_ubatch = 32 1: llama_new_context_with_model: flash_attn = 0 1: llama_new_context_with_model: freq_base = -nan 1: llama_new_context_with_model: freq_scale = 1_ Warning: llama_kv_cache_init: failed to allocate buffer for kv cache Warning: llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. Repeat 2 times:

at LLama.Native.SafeLLamaContextHandle.llama_n_ctx(LLama.Native.SafeLLamaContextHandle)

at LLama.Native.SafeLLamaContextHandle.get_ContextSize() at LLama.LLamaContext.get_ContextSize() at LLama.StatefulExecutorBase..ctor(LLama.LLamaContext, Microsoft.Extensions.Logging.ILogger) at LLama.InteractiveExecutor..ctor(LLama.LLamaContext, Microsoft.Extensions.Logging.ILogger) at GptLike.Program+
d0.MoveNext() at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.__Canon ByRef) at GptLike.Program.Main(System.String[]) at GptLike.Program.
(System.String[])

C:\Users\yassinebennani\source\repos\GptLike\GptLike\bin\Debug\net7.0\GptLike.exe (process 19268) exited with code -1073741819 (0xc0000005). Press any key to close this window . . .

Reproduction Steps

using LLama;
using LLama.Common;
using LLama.Native;

namespace GptLike
{
    internal class Program
    {
        static async Task Main(string[] args)
        {
            var directory = Directory.GetParent(Directory.GetCurrentDirectory()).Parent.Parent.ToString();
            var native_directory = Path.Combine(directory, "llama-b3902-bin-win-llvm-arm64");

            NativeLibraryConfig.Instance.WithLibrary(Path.Combine(native_directory, "llama.dll"), Path.Combine(native_directory, "llava_shared.dll"));
            NativeLibraryConfig.Instance.WithLogCallback(delegate (LLamaLogLevel level, string message) { Console.Write($"{level}: {message}"); });

            var modelPath = Path.Combine(directory, "phi-2.Q8_0.gguf");

            var parameters = new ModelParams(modelPath)
            {
                ContextSize = 2048,
                BatchSize = 2048,
                UBatchSize = 512,
                GpuLayerCount = 5,
                Embeddings = false
            };

            using (var model = LLamaWeights.LoadFromFile(parameters))
            {
               using (var context = model.CreateContext(parameters))
                {
                    var executor = new InteractiveExecutor(context);

                    var chatHistory = new ChatHistory();
                    var chatSession = new ChatSession(executor, chatHistory);

                    chatHistory.AddMessage(AuthorRole.Assistant, "Hello, how can I help you today?");

                    while (true)
                    {
                        Console.ForegroundColor = ConsoleColor.Green;
                        var input = Console.ReadLine();

                        if (input == "exit")
                        {
                            break;
                        }

                        chatHistory.AddMessage(AuthorRole.User, input);

                        await foreach (var response in chatSession.ChatAsync(chatHistory))
                        {
                            Console.ForegroundColor = ConsoleColor.Green;
                            Console.WriteLine(response);
                        }
                    }
                }

                Console.ReadLine();
            }
        }
    }
}

Environment & Configuration

Operating system: Windows ARM64 running on parallels desktop Virtual machine on MacOS M3
.NET runtime version: 8.0
LLamaSharp version: 0.16.0

Known Workarounds

No response

SciSharp / LLamaSharp