AnswerDotAI / bert24

Apache License 2.0
25 stars 3 forks source link

Add OLMo Model Initialization Options #61

Closed warner-benjamin closed 2 weeks ago

warner-benjamin commented 3 weeks ago

This PR adds multiple standard model weight initialization options, including defaults such as normal & Kaiming, fan in, Megatron, and Mitchell. Mitchell is used by OLMo and Megatron is used by models like Lama 2, (I believe) Pythia, etc.

I also added support for RWKV small embedding initialization, which looks like a useful way to speed up the initial training of the embedding weights. (It is not compatible with Megatron init).

Also, model.py apparently wasn't ruff formatted before, but it is now.

warner-benjamin commented 2 weeks ago

I set the default FlexBert initialization method to the PyTorch default initialization, so it matches our current training tests