LTH14 / mage

A PyTorch implementation of MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
MIT License
507 stars 26 forks source link

Why encoder-decoder architecture? #31

Open bsxxdw opened 1 year ago

bsxxdw commented 1 year ago

Hi @LTH14! Congrats on your nice work being accepted by CVPR. Just as the title, I'm confused why you choose to use an encoder-decoder architecture like MAE? Have you ever tried using a encoder only arch like BEiT?

LTH14 commented 1 year ago

We haven't tried an encoder-only structure like BEiT. The reason why we chose the MAE-style enc-dec structure is simply that they were the sota method at that time. Also, an encoder-decoder structure enables us to decouple the representation learning from the generation.