Prevent splitting of ModifiedMistralDecoderLayers

McGill-NLP / llm2vec

Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'

https://mcgill-nlp.github.io/llm2vec/

MIT License

816 stars 59 forks source link

Prevent splitting of ModifiedMistralDecoderLayers #92

Closed hatzel closed 3 weeks ago

hatzel commented 4 weeks ago

Previously the layers could be split across devices, e.g. when using device_map='auto'. This would result errors like this one:

RuntimeError:
Expected all tensors to be on the same device,
but found at least two devices, cuda:0 and cuda:1!

The remote code I got from huggingface has a different name MistralEncoderModel rather than MistralBiModel but I assume that this is the correct file to change. I tested this in locally by editing the remote files in my ~/.cache/ directory.

hatzel commented 4 weeks ago

I suppose this may also need to be fixed for the llama model but I have not checked.

vaibhavad commented 4 weeks ago

Thanks @hatzel for spotting this! Can you also add changes for Llama, so that everything is in the same PR?

_no_split_modules = ["ModifiedLlamaDecoderLayer"]

This will need to be added here

hatzel commented 3 weeks ago

Done!

vaibhavad commented 3 weeks ago

Thanks a lot @hatzel!