Possible to train Llama 3.1?

McGill-NLP / llm2vec

Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'

https://mcgill-nlp.github.io/llm2vec/

MIT License

1.31k stars 95 forks source link

Possible to train Llama 3.1? #133

Open mosh98 opened 3 months ago

mosh98 commented 3 months ago

Hi,

I tried training llama 3.1 with run_mntp.py but get an obsucre error

AttributeError: 'LlamaBiModel' object has no attribute 'rotary_emb'

What is that about ?

bzantium commented 3 months ago

you can check this: https://github.com/McGill-NLP/llm2vec/pull/127/commits/03382c358494a4e2f07222455b366fb75d625ab7

mosh98 commented 3 months ago

hmm still not sure what to do...

stefanhgm commented 3 months ago

Hi everyone,

@bzantium thanks for pointing us to the commit. I added the respective lines and used a more recent version of transformers to make it work. MNTP training for Llama 3.1 seems to work now for me. However, I failed to do the MTEB evaluation locally so far, see #123.

Did you make any progress in training Llama 3.1 for LLM2Vec?

mosh98 commented 3 months ago

@bzantium Thanks i was able to get the embeddings after adding in the lines, haven't been able to train it yet through MNTP but i'll keep on trying

andupotorac commented 3 months ago

@stefanhgm Once Llama 3.1 (I presume the 8B parameters model) is trained, can you use it for generating images the way ELLA uses t5, with better prompt adherence?

stefanhgm commented 3 months ago

@andupotorac I am not familiar with the ELLA project, but you could use the model to create embeddings just as with the other LLM2Vec models.

However, the eval on MTEB currently hangs #135

andupotorac commented 3 months ago

Thanks, I will keep an eye on it as well.