SeanNaren / min-LLM

Minimal code to train a Large Language Model (LLM).
MIT License
164 stars 8 forks source link

Fix model initialisation #15

Open SeanNaren opened 2 years ago

SeanNaren commented 2 years ago

We're currently relying on the minGPT/microGPT initialization, however this might need to be modified especially considering we're using ZeRO Stage 3.

Some investigation will be required to understand what the initialization should look like.