In the above image for MLP regression head, it has been shown there are two linear layers inside the regression head. Is this predefined? or can we modify the number of linear layers? if the number of linear layers can be defined how can we specify it in the head_configs?
The value 4096 x 128 in the first linear layer, does 4096 mean the hidden size pre-defined? or is it the hidden size we define in head_configs?
The value 128 x 5 in the second linear layer, does 5 mean the number of outputs from the regression head? if we expect just one output does that need to be 128 x 1 ??
The 4096 depends on the base transformer architecture. In this case, the transformer used has a hidden size of 4096 for each token (You can find the hidden size of your preferred LLM in it's config. E.g. here for llama 3). The hidden size of the head is 128 and specified in the HeadConfig.
Yes. Specify the number of outputs using num_outputs.
Also check this notebook to get a better understanding of HeadConfigs.
In the above image for MLP regression head, it has been shown there are two linear layers inside the regression head. Is this predefined? or can we modify the number of linear layers? if the number of linear layers can be defined how can we specify it in the head_configs?
The value 4096 x 128 in the first linear layer, does 4096 mean the hidden size pre-defined? or is it the hidden size we define in head_configs?
The value 128 x 5 in the second linear layer, does 5 mean the number of outputs from the regression head? if we expect just one output does that need to be 128 x 1 ??