How to combine 2d box and canny edge to control the image generation together?

Thank you for the subsequent updates on more controllable methods, including edge, depth, etc. So fast. But I have a question, when I want to combine 2d box and canny edge to control the image generation together, how to redesign the UNet network structure? For example, roughly stacking two gated self attention layers, one for fusing 2d box embedding, and the other for fusing edge embedding? Any more experience recommendations? I would like to get your answer! Thank you very much!

gligen / GLIGEN

How to combine 2d box and canny edge to control the image generation together? #63