Open shizidushu opened 5 months ago
I think 428 is the number of tokens and len(res[0]) is the Embedding Dimension. the length of res depends on the number of tokens you enter. but the length of the element of res depends on the Embedding Dimension
@yentur Following your comments, it will be one embedding for one token. But I guess the 428 tokens should be compressed into a single embedding. (Refer to https://www.mongodb.com/developer/products/atlas/choose-embedding-model-rag/#choosing-the-right-embedding-model-for-your-rag-application)
It looks as though this particlar model uses last token pooling, which isn't currently supported by llama.cpp
. It would be super easy to add, since we already have first token pooling, just hasn't come up with other models yet. The other option is to just get the token level embeddings and pick off the last token embedding manually.
Here is the code to get embedding.
Here I get some info about the output result (show print result in comments):
For the model https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct, I think I should get an embedding of 4096 length, but I got 428 instead.