center-for-humans-and-machines / transformer-heads

Toolkit for attaching, training, saving and loading of new heads for transformer models
https://transformer-heads.readthedocs.io/en/latest/
MIT License
236 stars 21 forks source link

Question regarding the MLP Regression head architecture diagram provided #4

Closed ArchchanaKugathasan closed 1 month ago

ArchchanaKugathasan commented 1 month ago
image
  1. In the above image for MLP regression head, it has been shown there are two linear layers inside the regression head. Is this predefined? or can we modify the number of linear layers? if the number of linear layers can be defined how can we specify it in the head_configs?

  2. The value 4096 x 128 in the first linear layer, does 4096 mean the hidden size pre-defined? or is it the hidden size we define in head_configs?

  3. The value 128 x 5 in the second linear layer, does 5 mean the number of outputs from the regression head? if we expect just one output does that need to be 128 x 1 ??

yannikkellerde commented 1 month ago
  1. This is completely custom. The image on the readme only shows a random example of how you could design your heads. It is defined via the head config. Specify the depth of your head via num_layers (1 for linear) and the width using hidden_size. See the docs here https://github.com/center-for-humans-and-machines/transformer-heads/blob/2515241b9420360ae5843c96b05d947bb8a3d1fb/transformer_heads/config.py#L19-L40).
  2. The 4096 depends on the base transformer architecture. In this case, the transformer used has a hidden size of 4096 for each token (You can find the hidden size of your preferred LLM in it's config. E.g. here for llama 3). The hidden size of the head is 128 and specified in the HeadConfig.
  3. Yes. Specify the number of outputs using num_outputs.

Also check this notebook to get a better understanding of HeadConfigs.

ArchchanaKugathasan commented 2 weeks ago

Thank you for the detailed explanation :)