Model Initialization Question

EleutherAI / pythia

The hub for EleutherAI's work on interpretability and learning dynamics

Apache License 2.0

2.16k stars 156 forks source link

What is the difference between the step 0 model weights you provided and the model weights randomly initialized with huggingface (by calling the two functions below)?

config = transformers.AutoConfig.from_pretrained("EleutherAI/pythia-1b")
model = transformers.AutoModelForCausalLM.from_config(config)

I've been seeing some very different behavior between these two different initializations. (For example, your initialization always trains much faster on my custom task.)

What do I need to do to get an initialization more similar to yours?

EleutherAI / pythia

Model Initialization Question #129