I think this might be better posted as an issue?
Originally posted by **jasperhyp** December 6, 2022
Hi! Please correct me if I am wrong, but I saw [here](https://github.com/facebookresearch/esm/blob/4e0ebb7a7b875ef40178cbb11e830eb5859b4180/esm/model/esm2.py#L71) that `lm_head` is (480, 33). However, when loading the model using `model, alphabet = esm.pretrained.esm2_t12_35M_UR50D()`, `model.lm_head` is actually having a dense layer of (480, 480). Why the discrepancy?
Discussed in https://github.com/facebookresearch/esm/discussions/416