Dicklesworthstone / swiss_army_llama

A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.
938 stars 51 forks source link

Small question: average hidden states for sentence embedding or last token embedding? #7

Closed waterluck closed 11 months ago

waterluck commented 11 months ago

Hi, thanks for the great work! I checked the code but I don't find the description and code when extracting sentence embedding from llama(or llama2) model. I'm curious about Q1. how you extract the sentence embedding? and Q2. you take average sentence embedding or last token embedding in the process.

For extract the sentence embedding, I'm using output = model.output(**input, output_hidden_states=True) sentence_embedding = output .hidden_states[-1].mean, or output .hidden_states[-1][:,-1,:] I don't know the difference of these two.
I'd appreciate if you can share some knowledge on it!

Dicklesworthstone commented 11 months ago

See the readme, you just use the endpoints for getting embedding vectors. To use llama2 as the model you would add the relevant ggml model file using the endpoint for that.

On Sat, Nov 18, 2023 at 11:44 PM Suchun Xie @.***> wrote:

Hi, thanks for the great work! I checked the code but I don't find the description and code when extracting sentence embedding from llama(or llama2) model. I'm curious about Q1. how you extract the sentence embedding? and Q2. you take average sentence embedding or last token embedding in the process.

For extract the sentence embedding, I'm using output = model.output(**input, output_hidden_states=True) sentence_embedding = output .hidden_states[-1].mean, or output .hidden_states[-1][:,-1,:] I don't know the difference of these two. I'd appreciate if you can share some knowledge on it!

— Reply to this email directly, view it on GitHub https://github.com/Dicklesworthstone/swiss_army_llama/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/AILNF3QBOODKT2QRISPXCQLYFGFBLAVCNFSM6AAAAAA7ROYN56VHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYDANRZGU3DINQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Dicklesworthstone commented 11 months ago

This endpoint:

POST /get_embedding_vector_for_string/: Retrieve Embedding Vector for a Given Text String. Retrieves the embedding vector for a given input text string using the specified model.

And you could add llama2 7b by adding this model:

https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/raw/main/llama-2-7b-chat.Q5_K_M.gguf

with this endpoint:

POST /add_new_model/: Add New Model by URL. Submit a new model URL for download and use. The model must be in .gguf format and larger than 100 MB to ensure it's a valid model file (you can directly paste in the Huggingface URL)