EleutherAI / pythia

The hub for EleutherAI's work on interpretability and learning dynamics
Apache License 2.0
2.16k stars 156 forks source link

Model Initialization Question #129

Closed yanlai00 closed 8 months ago

yanlai00 commented 8 months ago

What is the difference between the step 0 model weights you provided and the model weights randomly initialized with huggingface (by calling the two functions below)?

config = transformers.AutoConfig.from_pretrained("EleutherAI/pythia-1b")
model = transformers.AutoModelForCausalLM.from_config(config)

I've been seeing some very different behavior between these two different initializations. (For example, your initialization always trains much faster on my custom task.)

What do I need to do to get an initialization more similar to yours?

haileyschoelkopf commented 8 months ago

Hi, for more info about the initialization we used, please check out the paper, as well as v1.0 of the gpt-neox library, for code used to train these models (which pairs with the config files we provide for the neox library). We use the "wang_init" and "small_init" functions respectively depending on model component, defined here: https://github.com/EleutherAI/gpt-neox/blob/71df4d5017f9f4919566a11454fe3a507ffdc632/megatron/model/init_functions.py#L112

Huggingface is not optimized for training from scratch, and so their random initializations are less likely to be well-tested or optimized for this purpose.

Hope this helps!