Modalities / modalities

A framework for training multimodal foundation models.
MIT License
38 stars 3 forks source link

SwiGLU naming of projection matrices #170

Closed le1nux closed 2 days ago

le1nux commented 1 week ago

The current SwiGLU implementation defines the projection matrice names, different from the original paper (https://arxiv.org/pdf/2002.05202). We should stick to the W, V, W_2 names. The projection name c_proj in SwiGLU has the same name as a projection in GeLU already having lead to side-effects for weight initialisation (see comments in PR #168 )

https://github.com/Modalities/modalities/blob/f810fcce978e2f4fc577edf337835b6f4afa8aa9/src/modalities/models/model.py#L30C6-L45C10

class SwiGLU(nn.Module):
    def __init__(self, n_embd: int, bias: bool):
        super().__init__()

        hidden_dim = SwiGLU._get_hidden_dim(n_embd)

        self.c_fc = nn.Linear(
            in_features=n_embd,
            out_features=hidden_dim,
            bias=bias,
        )
        self.silu = nn.SiLU()
        self.c_proj = nn.Linear(
            in_features=n_embd,
            out_features=hidden_dim,
            bias=bias,
        )
        self.out_proj = nn.Linear(
            in_features=hidden_dim,
            out_features=n_embd,
            bias=bias,
        )
le1nux commented 2 days ago

We also need to add the SwiGLU layers to the parameters filters for weight initialization, https://github.com/Modalities/modalities/blob/5a2727fe3004c1e0739d23a733254f67c8ffdbd4/src/modalities/nn/model_initialization/parameter_name_filters.py#L36

https://github.com/Modalities/modalities/blob/5a2727fe3004c1e0739d23a733254f67c8ffdbd4/src/modalities/nn/model_initialization/parameter_name_filters.py#L59

le1nux commented 2 days ago

fixed in https://github.com/Modalities/modalities/pull/141/commits/2de5ab4e0ab2b3d107be52ac08a94b1589d71746 of PR #141