More information about VQGAN - Githubissues

LTH14 / mage

A PyTorch implementation of MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis

MIT License

507 stars 26 forks source link

More information about VQGAN #6

Closed gaopengpjlab closed 1 year ago

gaopengpjlab commented 1 year ago

Can you release pretrained VQGAN with more parameters and high resolution? By the way, can you share the FID score the your pretrained VQGAN?

LTH14 commented 1 year ago

We do not use VQGANs with different architectures in our paper, and the VQGAN is trained on ImageNet256x256. You can get the pre-trained VQGAN here. The FID of the VQGAN's reconstructed images is around 3.

gaopengpjlab commented 1 year ago

Can you release stronger tokenizer and high-resolution tokenizer ?

gaopengpjlab commented 1 year ago

Such as VQGAN with ImageNet 512 * 512?

LTH14 commented 1 year ago

The MAGE paper only contains results on ImageNet 256x256. We do not train a tokenizer with a resolution of 512x512.

For the "stronger" tokenizer, can you specify which one you refer to? We only have two tokenizers. One is trained with "strong augmentation", which is the tokenizer we used in this repo. The other one is trained with "weak augmentation". That one is in JAX and you can get the pre-trained weights here.

gaopengpjlab commented 1 year ago

Stronger means tokenizer with a resolution of 512x512. I am planning to scale MAEG to larger resolution, namely, 512x512.

LTH14 commented 1 year ago

I see. Unfortunately we do not train a tokenizer on 512x512. You can also check MaskGIT. In the MaskGIT repo, they release a tokenizer trained on ImageNet 512x512.

gaopengpjlab commented 1 year ago

Thank you so much for your kind reply.

gaopengpjlab commented 1 year ago

The FID of the VQGAN's reconstructed images is around 3.

The FID score is reported on ImagenNet train split or val split?

LTH14 commented 1 year ago

All FID scores in the paper are reported on ImageNet val split.

gaopengpjlab commented 1 year ago

Thank you very much. I asked this question because original VQGAN report both train and val FID score.

LTH14 commented 1 year ago

No worries