Curious about how to directly run LLM for embedding

Georgepitt commented 1 month ago

In your article, you directly use LLM for the embedding task as your baseline. I am curious about how this is done. Can you show me your script? There is also the Enabling bidirectional attention (Bi) section, which sounds like the model needs to be modified. Will using the mntp section files in this repository automatically implement Bi capabilities? Or do you need to modify the model Thank you for your help!

vaibhavad commented 1 month ago

@Georgepitt

Thanks for your interest in our work.

The LLM can be used directly with the following code.

import torch
from llm2vec import LLM2Vec

if __name__ == "__main__":

    l2v = LLM2Vec.from_pretrained(
        "meta-llama/Meta-Llama-3-8B-Instruct",
        enable_bidirectional=False,
        device_map="cuda" if torch.cuda.is_available() else "cpu",
        torch_dtype=torch.bfloat16,
        pooling_mode="mean",
    )

The encoding steps are same as those mentioned in the README.

Enabling bidirectional connections is handled by the llm2vec library. For more details on how it is implemented, you can checkout our tutorial. Currently only Llama and Mistral model families are supported. For any other model family, it will need to be implemented separately. You can check #81 and #70 for related discussion.

Georgepitt commented 1 month ago

Thank you for your help! It's really convenient ! (●′ω`●)

vaibhavad commented 1 month ago

No problem! Happy to help!

McGill-NLP / llm2vec

Curious about how to directly run LLM for embedding #85