McGill-NLP / llm2vec

Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
https://mcgill-nlp.github.io/llm2vec/
MIT License
1.17k stars 88 forks source link

Usage for multiple Contexts #98

Closed harshg99 closed 3 months ago

harshg99 commented 3 months ago

Hi, I am trying to run an experiment where we parse multiple contexts receive embeddings for each context, and evaluate how useful each context is for decision-making. Currently, my implementation merges different contexts and receives embedding for each merged context. Eg If I have 3 contexts - c1, c2, c3, I have to merge c1+c2, and c1 +c2 + c3 strings as individual sentences. Is there a way to efficiently merge llm2vec embeddings with a new context or string. For instance, if I have computed the embedding of c1+c2, I only need to evaluate it with c3, to receive the embedding for c1+c2+c3.

vaibhavad commented 3 months ago

Hi @harshg99,

Thanks for your interest in our work. Unfortunately, if you are going with string concatenation, then there is no way to compute the embeddings of sub-strings separately as the embeddings depend on the entire input.

However, if you were doing merging at the embedding like, then you can just compute c1, c2, and c3 embedding separately and add them accordingly. Although I am not sure if that is suitable for your application.

Let me know if you have any further questions.

vaibhavad commented 3 months ago

Closing as it is stale. Feel free to re-open if you have any more questions.