evolutionaryscale / esm

Other
1.26k stars 141 forks source link

VQ VAE structure decoder embedding dimension and heads #48

Closed nbrosse closed 3 months ago

nbrosse commented 4 months ago

In the paper, we can read

In the first stage, a smaller decoder trunk consisting of 8 Transformer blocks with width 1024, rotary positional embeddings, and MLPs is trained to only predict backbone coordinates. In the second stage, the decoder weights are re-initialized and the network size is expanded to 30 layers, each with an embedding dimension of 1280 (∼600M parameters) to predict all atom coordinates.

The embedding dimension or width must be divisible by the number of heads. The number of heads is 20 for the large decoder with an embedding dimension of 1280.

https://github.com/evolutionaryscale/esm/blob/95e3c5be8acda407414810ff3aa7d27dbb6e30d3/esm/pretrained.py#L55

What is the number of heads for the small decoder (width 1024), 16 heads ?

thayes427 commented 3 months ago

Thank you for your question! You are correct that the small decoder with width 1024 has 16 heads.