FrankFundel / SGCond

10 stars 1 forks source link

Is there a way to do T2I (text-to-image) generation using your code? #1

Open Saumya-Gupta-26 opened 1 month ago

Saumya-Gupta-26 commented 1 month ago

Hi! I'm interested in your work on using scene graphs to guide the image generation process. I would like to use semantic maps to feed to ControlNet to generate the images. After going through your code, I can see that the data.json has manual coordinates of a few scene graph examples. My question is, would your codebase be able to generate an image just based on a text prompt (eg: "two eggs in a pan")? If so, I would be grateful if you could point me towards it. Otherwise, how should one obtain the scene graph from the text prompt?

FrankFundel commented 1 month ago

Hi Saumya, thank you for your interest! The codebase is not intended to be used with a text prompt, but rather using scene graphs directly i.e. a user would prompt the models with a manually created scene graph for better control. If you are interested in T2I, one could generate a scene graph using a LLM for example first and then feeding it to one of the models.