Poor qualitative results with finetuned text-to-image generation model

Hello,

I am testing the provided finetuned BASE text-to-image generation model and the quality of the generated images are very poor compared to those in the paper. I have similar (but slightly better) results with the large model. It would be great if the authors can point to the problem or provide a script with the same hyperparameters used to generate the images in the paper.

I am using the provided image_gen_example.py.

Here are some results using the query in Fig. 3 in the paper: A street scene with a double-decker bus on the road._0: Cattle grazing on grass near a lake surrounded by mountain._0: A brown horse in the street_0:

Thanks in advance

OFA-Sys / OFA

Poor qualitative results with finetuned text-to-image generation model #402