Open rabiulcste opened 5 months ago
You can visualize the first stage of generation (i.e., individual box generation) to see if the appearance of the objects stays consistent. If they stay consistent, then having higher frozen steps helps preserve the appearance.
I'm working on image synthesis with a focus on vision-language fine-grained understanding. I'm facing a challenge in generating two images that maintain a consistent background but swap the positions of two objects (e.g., a dog on the left and a cat on the right in the first image, and vice versa in the second image).
I've tried fixing seed and bounding box location only swapping object names but it doesn't seem to be working. Any guidance would be greatly appreciated.