Contrasts on your paper's acceptance to CVPR, and thanks for sharing your wonderful work!
I have some questions about the COCO evaluation:
What dataset is used for evaluation on COCO? Is it COCO 2017 val set or COCO 2014 val set? Since COCO 2017 val set only has 5,000 images. But in your paper, you said that you randomly sampled 30,000 images. How do you randomly sample it?
Is the images sampled by randomly using differently designed prompts along with captions to make different combinations that lead to these 30K sampled images?
Could you please share your code for zero-shot COCO image generation? I find that COCO zero shot generation is a commonly used benchmark. However, no paper has released codes for this evaluation. But there are many details that are not clear. For example, some papers use COCO 2017 (e.g., StableDiffusion), while some papers use COCO 2014. Why is that?
Similar to other works, the zero-shot result is evaluated on COCO 2014 split;
Captions of validation set will be randomly sampled, as the inputs for the model. No additional human designed template/text/prompt is used during evaluation;
I don't know why Stable Diffusion reports results on COCO 2017, but the LDM paper (which SD is based on) evaluates results on COCO 2014. And I think almost all published paper report zero-shot evaluation on 2014 split. COCO 2017 images are more often used in downstream evaluation (after fine-tuning, thus not zero-shot), along with Local Narrative text;
Hi @drboog,
Contrasts on your paper's acceptance to CVPR, and thanks for sharing your wonderful work!
I have some questions about the COCO evaluation:
Best, Runpei