How are the base model weights loaded into llm2vec encoder model?

xiaoyuqian2 commented 1 month ago

When running the code snippet below,

model = AutoModel.from_pretrained(
    "McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp",
    trust_remote_code=True,
    config=config,
    torch_dtype=torch.bfloat16,
)

the original model (in this case, Llama-2-7b) will be downloaded automatically. I'm trying to demystify the automation here.

It looks like the _name_or_path parameter config.json is not used anywhere in modeling_llama_encoder.py. The llama weights seems are loaded when running self.post_init(). Is my understanding correct? I'm not so sure how exactly the weights are loaded into LlamaEncoderModel though. I'm guessing it's based on weight names? Would appreciate a lot if you could help me dive deep and understand it better. Thank you!

vaibhavad commented 1 month ago

Hi @xiaoyuqian2,

Thanks for your interest in our work. I am not fully familiar with all details of huggingface model but I'll try to explain the best to my understanding.

LlamaEncoderModel is sub-classed from LlamaModel, hence they share the model loading code. Your understanding is correct that weight loading happens when running self.post_init(). Here, it calls the post_init of LlamaModel as it is the parent class of LlamaEncoderModel and LlamaEncoderModel itself dow not implement this method.

As LlamaEncoderModel shares all the model weight names with LlamaModel, the loading based on weight names works as expected. This can be verified by printing a few weight values of LlamaModel and LlamaEncoderModel.

I tried to deep-dive into transformers library code to find where exactly weights are being loaded but so far I haven't been successful. Please let me know if you are able to find the exact code snippet.

vaibhavad commented 1 month ago

Closing as it is stale. Feel free to re-open if you have any more questions.

McGill-NLP / llm2vec

How are the base model weights loaded into llm2vec encoder model? #63