Closed cm4ker closed 5 months ago
Hi! I'm trying to run the KernelMemorySaveAndLoad example and faced with issue. This is a console log output:
C:/projects/blazor-server-side/Test/SemanticKernelTest/bin/Debug/net8.0/SemanticKernelTest.exe This program uses the Microsoft.KernelMemory package to ingest documents and store the embeddings as local files so they can be quickly recalled when this application is launched again. Please input your model path (or ENTER for default): (C:\test\llm\all-MiniLM-L12-v2.Q8_0.gguf): C:\test\llm\all-MiniLM-L12-v2.Q8_0.gguf Kernel memory folder: C:\projects\blazor-server-side\Test\SemanticKernelTest\bin\Debug\net8.0\storage-KernelMemorySaveAndLoad llama_model_loader: loaded meta data with 24 key-value pairs and 197 tensors from C:\test\llm\all-MiniLM-L12-v2.Q8_0.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = bert llama_model_loader: - kv 1: general.name str = all-MiniLM-L12-v2 llama_model_loader: - kv 2: bert.block_count u32 = 12 llama_model_loader: - kv 3: bert.context_length u32 = 512 llama_model_loader: - kv 4: bert.embedding_length u32 = 384 llama_model_loader: - kv 5: bert.feed_forward_length u32 = 1536 llama_model_loader: - kv 6: bert.attention.head_count u32 = 12 llama_model_loader: - kv 7: bert.attention.layer_norm_epsilon f32 = 0.000000 llama_model_loader: - kv 8: general.file_type u32 = 7 llama_model_loader: - kv 9: bert.attention.causal bool = false llama_model_loader: - kv 10: bert.pooling_type u32 = 1 llama_model_loader: - kv 11: tokenizer.ggml.token_type_count u32 = 2 llama_model_loader: - kv 12: tokenizer.ggml.bos_token_id u32 = 101 llama_model_loader: - kv 13: tokenizer.ggml.eos_token_id u32 = 102 llama_model_loader: - kv 14: tokenizer.ggml.model str = bert llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 100 llama_model_loader: - kv 19: tokenizer.ggml.seperator_token_id u32 = 102 llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 21: tokenizer.ggml.cls_token_id u32 = 101 llama_model_loader: - kv 22: tokenizer.ggml.mask_token_id u32 = 103 llama_model_loader: - kv 23: general.quantization_version u32 = 2 llama_model_loader: - type f32: 123 tensors llama_model_loader: - type f16: 1 tensors llama_model_loader: - type q8_0: 73 tensors llm_load_vocab: mismatch in special tokens definition ( 7104/30522 vs 5/30522 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = bert llm_load_print_meta: vocab type = WPM llm_load_print_meta: n_vocab = 30522 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 512 llm_load_print_meta: n_embd = 384 llm_load_print_meta: n_head = 12 llm_load_print_meta: n_head_kv = 12 llm_load_print_meta: n_layer = 12 llm_load_print_meta: n_rot = 32 llm_load_print_meta: n_embd_head_k = 32 llm_load_print_meta: n_embd_head_v = 32 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 384 llm_load_print_meta: n_embd_v_gqa = 384 llm_load_print_meta: f_norm_eps = 1.0e-12 llm_load_print_meta: f_norm_rms_eps = 0.0e+00 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 1536 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 0 llm_load_print_meta: pooling type = 1 llm_load_print_meta: rope type = 2 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 512 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 33M llm_load_print_meta: model ftype = Q8_0 llm_load_print_meta: model params = 33.21 M llm_load_print_meta: model size = 34.00 MiB (8.59 BPW) llm_load_print_meta: general.name = all-MiniLM-L12-v2 llm_load_print_meta: BOS token = 101 '[CLS]' llm_load_print_meta: EOS token = 102 '[SEP]' llm_load_print_meta: UNK token = 100 '[UNK]' llm_load_print_meta: SEP token = 102 '[SEP]' llm_load_print_meta: PAD token = 0 '[PAD]' llm_load_print_meta: CLS token = 101 '[CLS]' llm_load_print_meta: MASK token = 103 '[MASK]' llm_load_print_meta: LF token = 0 '[PAD]' llm_load_tensors: ggml ctx size = 0.09 MiB llm_load_tensors: CPU buffer size = 34.00 MiB ................................................. llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CPU KV buffer size = 36.00 MiB llama_new_context_with_model: KV self size = 36.00 MiB, K (f16): 18.00 MiB, V (f16): 18.00 MiB llama_new_context_with_model: CPU output buffer size = 0.00 MiB llama_new_context_with_model: CPU compute buffer size = 17.00 MiB llama_new_context_with_model: graph nodes = 431 llama_new_context_with_model: graph splits = 1 llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CPU KV buffer size = 36.00 MiB llama_new_context_with_model: KV self size = 36.00 MiB, K (f16): 18.00 MiB, V (f16): 18.00 MiB llama_new_context_with_model: CPU output buffer size = 0.00 MiB llama_new_context_with_model: CPU compute buffer size = 17.00 MiB llama_new_context_with_model: graph nodes = 431 llama_new_context_with_model: graph splits = 1 llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CPU KV buffer size = 36.00 MiB llama_new_context_with_model: KV self size = 36.00 MiB, K (f16): 18.00 MiB, V (f16): 18.00 MiB llama_new_context_with_model: CPU output buffer size = 0.00 MiB llama_new_context_with_model: CPU compute buffer size = 17.00 MiB llama_new_context_with_model: graph nodes = 431 llama_new_context_with_model: graph splits = 1 Existing kernel memory was not found. Documents will be analyzed (slow) and information saved to disk. Analysis will not be required the next time this program is run. Press ENTER to proceed... Importing 1 of 2: C:\projects\blazor-server-side\Test\SemanticKernelTest\bin\Debug\net8.0\Assets\sample-SK-Readme.pdf Completed in 00:00:01.7413797 Importing 2 of 2: C:\projects\blazor-server-side\Test\SemanticKernelTest\bin\Debug\net8.0\Assets\sample-KM-Readme.pdf Completed in 00:00:00.4631057 Question: What formats does KM support? Generating answer... llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CPU KV buffer size = 36.00 MiB llama_new_context_with_model: KV self size = 36.00 MiB, K (f16): 18.00 MiB, V (f16): 18.00 MiB llama_new_context_with_model: CPU output buffer size = 0.00 MiB llama_new_context_with_model: CPU compute buffer size = 17.00 MiB llama_new_context_with_model: graph nodes = 431 llama_new_context_with_model: graph splits = 1 llama_get_logits_ith: invalid logits id 324, reason: no logits Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object. at LLama.LLamaContext.ApplyPenalty(Int32 logits_i, IEnumerable`1 lastTokens, Dictionary`2 logitBias, Int32 repeatLastTokensCount, Single repeatPenalty, Single alphaFrequency, Single alphaPresence, Boolean penalizeNL) at LLama.StatelessExecutor.InferAsync(String prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)+MoveNext() at LLama.StatelessExecutor.InferAsync(String prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult() at Microsoft.KernelMemory.Search.SearchClient.AskAsync(String index, String question, ICollection`1 filters, Double minRelevance, CancellationToken cancellationToken) at Microsoft.KernelMemory.Search.SearchClient.AskAsync(String index, String question, ICollection`1 filters, Double minRelevance, CancellationToken cancellationToken) at SemanticKernelTest.KernelMemorySaveAndLoad.ShowAnswer(IKernelMemory memory, String question) in C:\projects\blazor-server-side\Test\SemanticKernelTest\KernelMemorySaveAndLoad.cs:line 150 at SemanticKernelTest.KernelMemorySaveAndLoad.AskSingleQuestion(IKernelMemory memory, String question) in C:\projects\blazor-server-side\Test\SemanticKernelTest\KernelMemorySaveAndLoad.cs:line 109 at SemanticKernelTest.KernelMemorySaveAndLoad.Run() in C:\projects\blazor-server-side\Test\SemanticKernelTest\KernelMemorySaveAndLoad.cs:line 57 at SemanticKernelTest.Program.Main() in C:\projects\blazor-server-side\Test\SemanticKernelTest\Program.cs:line 8 at SemanticKernelTest.Program.<Main>() Process finished with exit code -532,462,766.
Installed packages:
<PackageReference Include="LLamaSharp" Version="0.12.0"/> <PackageReference Include="LLamaSharp.Backend.Cpu" Version="0.12.0"/> <PackageReference Include="LLamaSharp.kernel-memory" Version="0.12.0"/> <PackageReference Include="LLamaSharp.semantic-kernel" Version="0.12.0"/> <PackageReference Include="Microsoft.KernelMemory.Core" Version="0.61.240524.1"/> <PackageReference Include="Microsoft.SemanticKernel" Version="1.13.0"/> <PackageReference Include="Microsoft.SemanticKernel.Plugins.Memory" Version="1.6.2-alpha"/> <PackageReference Include="Spectre.Console" Version="0.49.1"/>
Solved by change model. I think in model I have used was a embedding layer but not layer for generate text
Description
Hi! I'm trying to run the KernelMemorySaveAndLoad example and faced with issue. This is a console log output:
Installed packages: