256x256 PixArt-Sigma model, only get a FID-30k nearly ~49 on coco2014 val , is that normal?

PixArt-alpha / PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

https://pixart-alpha.github.io/PixArt-sigma-project/

GNU Affero General Public License v3.0

1.44k stars 68 forks source link

256x256 PixArt-Sigma model, only get a FID-30k nearly ~49 on coco2014 val , is that normal? #46

Open Gus-Guo opened 2 months ago

Gus-Guo commented 2 months ago

FID computation is from the codes in https://github.com/mseitzer/pytorch-fid/tree/master. 30k prompts are radomly sampled from COCO 2014 captions val split.

After generation using these randomly sampled prompts, I compute fid between generated images and images in coco2014 val split. model config is default: 20step, dpm solver. cfg_scale=4.5, model is PixArt-Sigma-XL-2-256x256.pth . But the fid result is nearly 49, kind of confused... is that normal? It is reported that PixArt-alpha 256x256 model obtains a nearly fid score of 7.

lawrence-cj commented 2 months ago

It's normal. Since the MSCOCO dataset is much different from our high-aesthetic finetuning dataset. If you want to get the 7 scored FID, you should use this one: https://huggingface.co/PixArt-alpha/PixArt-alpha/blob/main/PixArt-XL-2-256x256-MSCOCO-FID732.pth, which is fine-tuned on a much smaller ImageNet like dataset.

youngwanLEE commented 2 months ago

@lawrence-cj I also have a question. Then, in your pixart-sigma new paper, are the FID scores fine-tuned on the much smaller ImageNet-like dataset as shown in Table 3?

lawrence-cj commented 2 months ago

In the PixArt-Sigma paper. The FID is on the new high-quality evaluation dataset we collected. It will be released in the near feature.

youngwanLEE commented 2 months ago

@lawrence-cj Awesome!!

It would be very helpful for the community to test your new eval dataset.

So, would you release the FID eval code on the new eval dataset?

Can't wait!

Gus-Guo commented 2 months ago

It's normal. Since the MSCOCO dataset is much different from our high-aesthetic finetuning dataset. If you want to get the 7 scored FID, you should use this one: https://huggingface.co/PixArt-alpha/PixArt-alpha/blob/main/PixArt-XL-2-256x256-MSCOCO-FID732.pth, which is fine-tuned on a much smaller ImageNet like dataset.

@lawrence-cj hi lawrence, may I ask about the detail about coco fid evaluation? what is the strategy of selecting 30k prompts, since in coco caption val one image can have more than one caption. And do you use all images of coco val as the reference images or just use the images corresponding to the selected 30k prompts as reference images?

lawrence-cj commented 2 months ago

random select 30K @Gus-Guo

lawrence-cj commented 2 months ago

The code will release with the evaluation dataset. @youngwanLEE

Gus-Guo commented 2 months ago

random select 30K @Gus-Guo

@lawrence-cj Thanks for your reply. Do you mean randomly select 30k prompts? so it is possible that the the selected prompts could have more than one prompts describing the same image?

TinyTigerPan commented 1 month ago

In the PixArt-Sigma paper. The FID is on the new high-quality evaluation dataset we collected. It will be released in the near feature.

Hi, thanks for your wonderdul work, is the new high-quality evaluation dataset released?