For Llama models, bidirectional connections are not enabled when batch size is 1 or no padding token in batch

McGill-NLP / llm2vec

Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'

https://mcgill-nlp.github.io/llm2vec/

MIT License

1.17k stars 88 forks source link

For Llama models, bidirectional connections are not enabled when batch size is 1 or no padding token in batch #74

Closed vaibhavad closed 4 months ago

vaibhavad commented 4 months ago

Discovered while inspecting #68 , it is due to a condition introduced in transformers > 4.40, that passes None as the attention mask, which later defaults to causal attention mask.