This PR introduces the components for weight initialisation and is based on PR #161.
In PR #161 the differenct initialization methods plain, scaled and scaled_embed (see https://arxiv.org/abs/2312.16903) were implemented and added to the abstract NNModel class.
Due to some design concerns (e.g., some GPT2 internals were called from the parent), we decided to introduce a weight initialisation component that modifies the model weights in place.
General changes
Components and factories for plain, scaled and scaled_embed initialisation.
Breaking Changes
The raw model (i.e., the model with random weights) must be initialised with a weight initialiser, as shown here.
Checklist before submitting final PR
[x] My PR is minimal and addresses one issue / enhancement in isolation
[x] I have merged the target branch into this feature branch
[x] I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
[x] I have run a sample config for model training
[x] I have fixed all failing tests (python tests/tests.py)
What does this PR do?
This PR introduces the components for weight initialisation and is based on PR #161. In PR #161 the differenct initialization methods
plain
,scaled
andscaled_embed
(see https://arxiv.org/abs/2312.16903) were implemented and added to the abstractNNModel
class. Due to some design concerns (e.g., some GPT2 internals were called from the parent), we decided to introduce a weight initialisation component that modifies the model weights in place.General changes
plain
,scaled
andscaled_embed
initialisation.Breaking Changes
Checklist before submitting final PR
python tests/tests.py
)