SeanLee97 / AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
https://arxiv.org/abs/2309.12871
MIT License
493 stars 33 forks source link

[QUESTION] How to use prompt C when using through HuggingFace embeddings loader #35

Closed kairoswealth closed 10 months ago

kairoswealth commented 10 months ago

I am using Llamaindex to index documents into chromadb and for that I use the HuggingFaceEmbedding abstraction like that:

embed_model = HuggingFaceEmbedding(model_name="WhereIsAI/UAE-Large-V1")

However I read that one need to specify prompt C in order to optimize the embedding for retrieval. 1) is the prompt only used during retrieval? ie for the question embedding? or also for documents indexing? 2) any idea if that setting is supported through HuggingFace//Llamaindex abstractions, and how? 3) in the event that prompt C arg is not supported, would the resulting vector be significantly performing less in retrieval use cases?

SeanLee97 commented 10 months ago

For question:

  1. yes, just use it for the query texts, do not use it for document indexing.

2&3. Sorry, I haven't used Llamaindex. Maybe you can manually apply the prompt to the query text as follows:

from angle_emb import Prompts

query_text = 'this is a query'
query_text = Prompts.C.format(text=query_text)

embeddings = embed_model.get_text_embedding(query_text)
...
kairoswealth commented 10 months ago

Awesome, that is very clear now. I'll apply the prompt manually on retrieval. Thanks a lot!