Open Laidawang opened 2 months ago
they are very different things, my method is meant to improve the attention masking with regional prompting. I'm considering doing the same text conditioning in the cross attention, shouldn't be too complicated but the code is becoming really gargantuan and I don't want to derail too much from what ipadapter actually is.
Maybe consider opening another comfy version? I think this is very useful for generating large images or for precise screen control. PS: I saw that you were calling some of comfy's implementation of conditioning , the reason I posted the issue here is that I wanted something similar can auso be done with images. Imagine we just drag 4 pictures and merge them into one picture.
I'm glad you developed this feature, but I recently noticed a new git with similar ideas. https://github.com/YangLing0818/RPG-DiffusionMaster After checking his code again I think he might have better results. Calculate condition during cross attention. Would you be interested in implementing their algorithm, or provide a brief example for us to build this scenario?
This is my comfy reproduction but I don't think it works very well.
their result:
![image](https://github.com/cubiq/ComfyUI_IPAdapter_plus/assets/85244566/7b94f1b2-96eb-40e3-82f7-246fd65e79e6)