Why is `lm_head` in ESM-2 (35M) (480, 480)?

Discussed in https://github.com/facebookresearch/esm/discussions/416

I think this might be better posted as an issue? ^{Originally posted by **jasperhyp** December 6, 2022} Hi! Please correct me if I am wrong, but I saw [here](https://github.com/facebookresearch/esm/blob/4e0ebb7a7b875ef40178cbb11e830eb5859b4180/esm/model/esm2.py#L71) that `lm_head` is (480, 33). However, when loading the model using `model, alphabet = esm.pretrained.esm2_t12_35M_UR50D()`, `model.lm_head` is actually having a dense layer of (480, 480). Why the discrepancy?

facebookresearch / esm

Why is `lm_head` in ESM-2 (35M) (480, 480)? #418

Discussed in https://github.com/facebookresearch/esm/discussions/416