Modalities / modalities

Modalities, a PyTorch-native framework for distributed and reproducible foundation model training.
MIT License
59 stars 5 forks source link

Draft: Feat/initialization component #168

Closed le1nux closed 3 months ago

le1nux commented 3 months ago

What does this PR do?

This PR introduces the components for weight initialisation and is based on PR #161. In PR #161 the differenct initialization methods plain, scaled and scaled_embed (see https://arxiv.org/abs/2312.16903) were implemented and added to the abstract NNModel class. Due to some design concerns (e.g., some GPT2 internals were called from the parent), we decided to introduce a weight initialisation component that modifies the model weights in place.

General changes

Breaking Changes

Checklist before submitting final PR