Generation quality of the model

Thanks for your inspiring work!

However, I encountered a problem. When I use the model trained on COCO-Stuff and image size is 512*512, the generation quality seems poor.

The prompt from coco-stuff is:

  layout = {
    "bbox":
      [
        ['metal', 0.04218750074505806, 0.25647059082984924, 0.10000000149011612, 0.5247058868408203],
        ['chair', 0.17940625548362732, 0.4312705993652344, 0.35014063119888306, 0.5062353014945984],
        ['sky-other', 0.606249988079071, 0.0, 0.734375, 0.09882353246212006],
        ['person', 0.0, 0.5493882298469543, 0.07332812249660492, 0.7298117876052856],
        ['pavement', 0.0, 0.5976470708847046, 0.9781249761581421, 1.0],
        ['building-other', 0.0, 0.0, 1.0, 0.7152941226959229],
        ['person', 0.8331093788146973, 0.5236706137657166, 0.913937509059906, 0.8113176226615906],
        ['chair', 0.422062486410141, 0.4221176505088806, 0.6030937433242798, 0.499505877494812],
        ['bus', 0.1626562476158142, 0.29044705629348755, 0.8476094007492065, 0.9376470446586609],
        ['person', 0.32343751192092896, 0.3623529374599457, 0.792187511920929, 0.5176470875740051],
        ['person', 0.9270156025886536, 0.49814116954803467, 0.9953437447547913, 0.8023764491081238],
        ['clothes', 0.15000000596046448, 0.567058801651001, 1.0, 1.0]
      ]
  }

The generation config is:

{
 "dataset": "coco_stuff",
 "num_bucket_per_side": [256, 256],
 "width": 512,
 "height": 512,
 "prompt_template": "An image with {bbox}",
 "cfg_scale": 4.5,
 "num_inference_steps": 50,
 "max_num_bbox": 18
}

However, the generation result seems strange using run_layout_to_image.py: coco_stuff_0

I've tried different prompts and the results are very confusing.

What's wrong with my operation? Thanks!

KaiChen1998 / GeoDiffusion

Generation quality of the model #7