Is there a way to do T2I (text-to-image) generation using your code?

Hi! I'm interested in your work on using scene graphs to guide the image generation process. I would like to use semantic maps to feed to ControlNet to generate the images. After going through your code, I can see that the data.json has manual coordinates of a few scene graph examples. My question is, would your codebase be able to generate an image just based on a text prompt (eg: "two eggs in a pan")? If so, I would be grateful if you could point me towards it. Otherwise, how should one obtain the scene graph from the text prompt?

FrankFundel / SGCond

Is there a way to do T2I (text-to-image) generation using your code? #1