Closed xiaoyuqian2 closed 1 month ago
Hi @xiaoyuqian2,
Thanks for your interest in our work. I am not fully familiar with all details of huggingface model but I'll try to explain the best to my understanding.
LlamaEncoderModel
is sub-classed from LlamaModel
, hence they share the model loading code. Your understanding is correct that weight loading happens when running self.post_init(). Here, it calls the post_init
of LlamaModel
as it is the parent class of LlamaEncoderModel
and LlamaEncoderModel
itself dow not implement this method.
As LlamaEncoderModel
shares all the model weight names with LlamaModel
, the loading based on weight names works as expected. This can be verified by printing a few weight values of LlamaModel
and LlamaEncoderModel
.
I tried to deep-dive into transformers library code to find where exactly weights are being loaded but so far I haven't been successful. Please let me know if you are able to find the exact code snippet.
Closing as it is stale. Feel free to re-open if you have any more questions.
When running the code snippet below,
the original model (in this case,
Llama-2-7b
) will be downloaded automatically. I'm trying to demystify the automation here.It looks like the _name_or_path parameter config.json is not used anywhere in modeling_llama_encoder.py. The llama weights seems are loaded when running self.post_init(). Is my understanding correct? I'm not so sure how exactly the weights are loaded into LlamaEncoderModel though. I'm guessing it's based on weight names? Would appreciate a lot if you could help me dive deep and understand it better. Thank you!