Open Gus-Guo opened 2 months ago
It's normal. Since the MSCOCO dataset is much different from our high-aesthetic finetuning dataset. If you want to get the 7 scored FID, you should use this one: https://huggingface.co/PixArt-alpha/PixArt-alpha/blob/main/PixArt-XL-2-256x256-MSCOCO-FID732.pth, which is fine-tuned on a much smaller ImageNet like dataset.
@lawrence-cj I also have a question. Then, in your pixart-sigma new paper, are the FID scores fine-tuned on the much smaller ImageNet-like dataset as shown in Table 3?
In the PixArt-Sigma paper. The FID is on the new high-quality evaluation dataset we collected. It will be released in the near feature.
@lawrence-cj Awesome!!
It would be very helpful for the community to test your new eval dataset.
So, would you release the FID eval code on the new eval dataset?
Can't wait!
It's normal. Since the MSCOCO dataset is much different from our high-aesthetic finetuning dataset. If you want to get the 7 scored FID, you should use this one: https://huggingface.co/PixArt-alpha/PixArt-alpha/blob/main/PixArt-XL-2-256x256-MSCOCO-FID732.pth, which is fine-tuned on a much smaller ImageNet like dataset.
@lawrence-cj hi lawrence, may I ask about the detail about coco fid evaluation? what is the strategy of selecting 30k prompts, since in coco caption val one image can have more than one caption. And do you use all images of coco val as the reference images or just use the images corresponding to the selected 30k prompts as reference images?
random select 30K @Gus-Guo
The code will release with the evaluation dataset. @youngwanLEE
random select 30K @Gus-Guo
@lawrence-cj Thanks for your reply. Do you mean randomly select 30k prompts? so it is possible that the the selected prompts could have more than one prompts describing the same image?
In the PixArt-Sigma paper. The FID is on the new high-quality evaluation dataset we collected. It will be released in the near feature.
Hi, thanks for your wonderdul work, is the new high-quality evaluation dataset released?
FID computation is from the codes in https://github.com/mseitzer/pytorch-fid/tree/master. 30k prompts are radomly sampled from COCO 2014 captions val split.
After generation using these randomly sampled prompts, I compute fid between generated images and images in coco2014 val split. model config is default: 20step, dpm solver. cfg_scale=4.5, model is PixArt-Sigma-XL-2-256x256.pth . But the fid result is nearly 49, kind of confused... is that normal? It is reported that PixArt-alpha 256x256 model obtains a nearly fid score of 7.