LTH14 / mage

A PyTorch implementation of MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
MIT License
507 stars 26 forks source link

Padding on the original image or not? #3

Closed Zyriix closed 1 year ago

Zyriix commented 1 year ago

Hey guys, your work is so cool. But i'm wondering that, when you do inpaint/outpaint, do you pad the pixel with zero or something else? Because if you do not do so. The encoder may "see" some context imformations, this will cause a information leak. Which means that the reconstruct result may look goot but it use informations it shouldn't known.

Thank you!

LTH14 commented 1 year ago

Thanks for your interest! For inpainting/outpainting, we first pad the raw pixels with zero. Then we mask out the corresponding tokens. We observe that in the token space, if we only mask the tokens corresponding to the original pixel mask, the tokens around the input mask will still record the "mask" as part of the ground truth. Therefore, we mask one more token near the input mask.

More specifically, say the original pixel mask is from pixel 64-191 (128x128 pixel mask) in the original 256x256 image. Then the token mask corresponding to the original mask should be from token 4-11 (8x8 token mask) in the 16x16 token space. However, instead of masking tokens from 4-11, we use a token mask from 3-12 (10x10 token mask) to avoid the remaining tokens recording the "mask" in the input.

We will also release a colab for generation and image editing soon (possibly in March).

Zyriix commented 1 year ago

Appreciate for your anwser, this help me better understanding your work! Looking forward to your future work!