Modalities / modalities

Modalities, a PyTorch-native framework for distributed and reproducible foundation model training.
MIT License
59 stars 5 forks source link

Implementation of optimizer parameter groups with and without weight decay #139

Closed flxst closed 3 months ago

flxst commented 4 months ago

This PR allows to exclude certain groups of parameters (e.g. embeddings, layer norms) from weight decay. The parameter groups are model-dependent and defined as part of the respective model class. Exclusion can be triggered in the training config file.