SciSharp / LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
https://scisharp.github.io/LLamaSharp
MIT License
2.55k stars 334 forks source link

Crash on KernelMemory Gathering Information / References with LLamaSharp Embedder #407

Open LSXAxeller opened 9 months ago

LSXAxeller commented 9 months ago

I had a go at using Kernel Memory to store documents (1.3MB), but it kept crashing after memory.SaveReferenceAsync, or with the latest LLamaSharp commit and Kernel Memory LLamaSharp connector, GenerationMemory.ImportDocumentAsync.

LLamaSharp 0.8.1 & LLamaSharp.KernelMemory 0.8.1

var GenerationModelParameters = new ModelParams("THE_PATH_TO_MODEL")
            {
                ContextSize = 4096,
                GpuLayerCount =  0
            };
            var GenerationModel = LLamaWeights.LoadFromFile(GenerationModelParameters);
            var GenerationModelEmbedder = new(GenerationModel, GenerationModelParameters);
            var MemoryContext = GenerationModelEmbedder.Context;

            var memory = new Microsoft.SemanticKernel.Memory.MemoryBuilder()
                .WithTextEmbeddingGeneration(new LLamaSharp.SemanticKernel.TextEmbedding.LLamaSharpEmbeddingGeneration(GenerationModelEmbedder))
                .WithMemoryStore(new Microsoft.SemanticKernel.Memory.VolatileMemoryStore())
                .Build();

            Dictionary<string, string> fileContents = new();
            string[] files = Directory.GetFiles("twi", "*.txt");
            foreach (string file in files)
            {
                string fileName = Path.GetFileName(file);
                string content = File.ReadAllText(file);
                fileContents.Add(fileName, content);
            }

            foreach (var entry in fileContents)
            {
                var result = await memory.SaveReferenceAsync(
                    collection: "Twilight",
                    externalSourceName: "FanFiction",
                    externalId: entry.Key,
                    text: entry.Value);
            }

            var qs = "Who is the Volturi's greatest enemy?";
            var memories = memory.SearchAsync("Twilight", qs, limit: 10, minRelevanceScore: 0.5);
            var stringBuilder = new StringBuilder();
            await foreach (var result in memories)
            {
                stringBuilder.AppendLine("  Path:     : " + result.Metadata.Id);
                stringBuilder.AppendLine("  Result    : " + result.Metadata.Text);
                stringBuilder.AppendLine("  Relevance: " + result.Relevance);
                stringBuilder.AppendLine();
            }

            File.WriteAllText("res.txt", stringBuilder.ToString());
        }
        catch (OperationCanceledException ex)
        {
            File.WriteAllText("res01.txt", ex.Message);
        }
        catch (Exception ex)
        {
            File.WriteAllText("res01.txt", ex.Message);
        }

Latest LLamaSharp commit & Microsoft.KernelMemory.AI.LlamaSharp 0.24.231228.5 (latest)

try
        {

            var GenerationModelParameters = new ModelParams("THE_PATH_TO_MODEL")
            {
                ContextSize = 4096,
                GpuLayerCount =  0
            };
            var GenerationModel = LLamaWeights.LoadFromFile(GenerationModelParameters);
            var GenerationModelEmbedder = new(GenerationModel, GenerationModelParameters);
            var MemoryContext = GenerationModelEmbedder.Context;

            var GenerationMemory = new KernelMemoryBuilder()
                .WithSearchClientConfig(new SearchClientConfig { MaxMatchesCount = 2, AnswerTokens = 100 })
                .WithLLamaSharpTextGeneration(new LlamaSharpTextGenerator(GenerationModel, MemoryContext))
                .WithLLamaSharpTextEmbeddingGeneration(new LLamaSharpTextEmbeddingGenerator(GenerationModelEmbedder))
                .WithSimpleFileStorage(new SimpleFileStorageConfig { StorageType = FileSystemTypes.Disk })
                .Build<MemoryServerless>();

            await GenerationMemory.ImportDocumentAsync(new Document("twilight")
            .AddFiles(new[] {
                "tl.pdf"
            }));

            var qs = "Who is the Volturi's greatest enemy?";
            var answer = await GenerationMemory.AskAsync(qs);
            File.WriteAllText("res.txt", answer.Result);
        }
        catch (OperationCanceledException ex)
        {
            File.WriteAllText("res01.txt", ex.Message);
        }
        catch (Exception ex)
        {
            File.WriteAllText("res01.txt", ex.Message);
        }

Both Codes getting the same error

The target process exited with code -2146233082 (Ox80131506)
while evaluating the function
`System.SpanDebugView<T>.SpanDebugView`.

on  return embeddings.ToArray(); from
`
public float[] GetEmbeddings(string text, bool addBos)
        {
            var embed_inp_array = Context.Tokenize(text, addBos);

            // TODO(Rinne): deal with log of prompt

            if (embed_inp_array.Length > 0)
                Context.Eval(embed_inp_array, 0);

            var embeddings = NativeApi.llama_get_embeddings(Context.NativeHandle);
            if (embeddings == null)
                return Array.Empty<float>();

            return embeddings.ToArray();
        }
`
LSXAxeller commented 8 months ago

Now after update 0.9.1, It keeps importing document forever, and both codes still no luck even asking the parts that generated tl.pdf.partition.0.txt.LLamaSharp.KernelMemory.LLamaSharpTextEmbeddingGenerator.TODO.text_embedding