SeanLee97 / AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
https://arxiv.org/abs/2309.12871
MIT License
454 stars 32 forks source link

[QUESTION] How to use prompt C when using through HuggingFace embeddings loader #35

Closed kairoswealth closed 8 months ago

kairoswealth commented 8 months ago

I am using Llamaindex to index documents into chromadb and for that I use the HuggingFaceEmbedding abstraction like that:

embed_model = HuggingFaceEmbedding(model_name="WhereIsAI/UAE-Large-V1")

However I read that one need to specify prompt C in order to optimize the embedding for retrieval. 1) is the prompt only used during retrieval? ie for the question embedding? or also for documents indexing? 2) any idea if that setting is supported through HuggingFace//Llamaindex abstractions, and how? 3) in the event that prompt C arg is not supported, would the resulting vector be significantly performing less in retrieval use cases?

SeanLee97 commented 8 months ago

For question:

  1. yes, just use it for the query texts, do not use it for document indexing.

2&3. Sorry, I haven't used Llamaindex. Maybe you can manually apply the prompt to the query text as follows:

from angle_emb import Prompts

query_text = 'this is a query'
query_text = Prompts.C.format(text=query_text)

embeddings = embed_model.get_text_embedding(query_text)
...
kairoswealth commented 8 months ago

Awesome, that is very clear now. I'll apply the prompt manually on retrieval. Thanks a lot!