SciSharp / LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
https://scisharp.github.io/LLamaSharp
MIT License
2.63k stars 342 forks source link

How to use embedding correctly #547

Open xuzeyu91 opened 8 months ago

xuzeyu91 commented 8 months ago

What kind of model should be used for embedding? When I use nomic-embed-text-v1.5.f32.gguf, it will report protected memory, while when I use tinyllama-1.1b-chat.gguf, it can run normally. However, I feel that the returned float array is not correct. When I use the same text for vector matching, the similarity is only 0.42

martindevans commented 8 months ago

I'm not familiar with nomic, but if it's based on the BERT architecture it's supported in LLamaSharp yet. BERT support was only added to llama.cpp a couple of weeks ago (https://github.com/ggerganov/llama.cpp/pull/5423), and we haven't updated our binaries yet.

However, I feel that the returned float array is not correct. When I use the same text for vector matching, the similarity is only 0.42

Do you mean you literally fed the same text in twice at it wasn't identical? If so that's definitely a bug!

ladeak commented 8 months ago

I have the same issue using the phi-2 and llama models through the integration of semantickernel. The values returned from the 'memory' seems to be completely independent to the search value. And I have the same issue I put in an exact match for the search.

AshD commented 8 months ago

I experienced the same issue with the poor similarity matching with Semantic Kernel. Once LlamaSharp updates the binaries to support the Bert models, this issue should go away.

ladeak commented 8 months ago

Why will the update of Bert models help? @AshD could you expand the issue should go away?