Modalities / modalities

A framework for training multimodal foundation models.
MIT License
57 stars 5 forks source link

RMS norm implementation #67

Closed le1nux closed 5 months ago

le1nux commented 6 months ago

The layer norm was originally instantiated individually for every attention block internally. https://github.com/Modalities/modalities/blob/dd0db07bbe631e9dc30f35912076d26603f4a6b7/src/modalities/models/gpt2/gpt2_model.py#L193

For every new layer norm type we would have had to add an if-clause to check which layer norm we would want to instantiate. As a workaround, we now pass in the layer norm object from outside to the GPT2 model and copy it in every attention block. Note that we override the copy function in the layer norm implementations.

For the future, it would make sense to have the possibility to instantiate Lists of components. For instance, a GPTModel would have a dependency for a list of attention block. We would specify a single attention block and instantiate the block n times (see num_instances in the YAML below). Each attention block would have a dependency for a layer norm and would not have to be copied internally anymore.

This is an example:

model:
  component_key: model
  variant_key: gpt2
  config:
    [...]
    attention_blocks:
        component_key: attention_block
        variant_key: gpt2_attention_block
        num_instances: 12 
        config:    
          n_embd: 768
          dropout: 0.0
          scaling_factor: 3
         [...]