LTH14 / mage

A PyTorch implementation of MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
MIT License
507 stars 26 forks source link

About the computation of logits after decoder #20

Open liuqk3 opened 1 year ago

liuqk3 commented 1 year ago

Hi @LTH14 ,

Great work! Here I have a question about the computation of logits after decoder. I find that a MlmLayer is used. The output of decoder is mapped by a fc layer, and then the dot product between the mapped features and word embeddings are obtained, which is added with bias and used as the logits.

Have you tried to get the logits directly using a fc layer (upon the output feature of decoder)? What's the main difference between these two types of logits? And which one do you think is better?

Thanks.

LTH14 commented 1 year ago

We follow BERT for this design. I haven't tried using logits directly from an fc layer.