For every new layer norm type we would have had to add an if-clause to check which layer norm we would want to instantiate. As a workaround, we now pass in the layer norm object from outside to the GPT2 model and copy it in every attention block. Note that we override the copy function in the layer norm implementations.
For the future, it would make sense to have the possibility to instantiate Lists of components. For instance, a GPTModel would have a dependency for a list of attention block. We would specify a single attention block and instantiate the block n times (see num_instances in the YAML below). Each attention block would have a dependency for a layer norm and would not have to be copied internally anymore.
The layer norm was originally instantiated individually for every attention block internally. https://github.com/Modalities/modalities/blob/dd0db07bbe631e9dc30f35912076d26603f4a6b7/src/modalities/models/gpt2/gpt2_model.py#L193
For every new layer norm type we would have had to add an if-clause to check which layer norm we would want to instantiate. As a workaround, we now pass in the layer norm object from outside to the GPT2 model and copy it in every attention block. Note that we override the copy function in the layer norm implementations.
For the future, it would make sense to have the possibility to instantiate Lists of components. For instance, a GPTModel would have a dependency for a list of attention block. We would specify a single attention block and instantiate the block n times (see
num_instances
in the YAML below). Each attention block would have a dependency for a layer norm and would not have to be copied internally anymore.This is an example: