Open Akira13641 opened 2 months ago
So it seems like whatever concatting with empty "Clip Text Encode" does should just actually be what "T5 Text Encode" does internally by default, all the time, otherwise "T5 Text Encode" is worse no matter what
ELLA is for comprehending dense prompts, encompassing multiple objects, detailed attributes, complex relationships, long-text alignment, etc. However, there is no guarantee that ELLA will bring better results in aesthetic scores.
whatever concatting with empty "Clip Text Encode" does should just actually be what "T5 Text Encode" does internally by default
The current design is to be more flexible usage. Thanks for the suggestion, I will consider adding a text encode, using both t5 and clip by default. But before doing this, t5 weighted prompt support need to be solved first (The prompt we give to clip will always have a weight).
@Akira13641 I've add a ELLA Text Encode
node to automatically concat ella and clip condition.
This setup:![image](https://github.com/TencentQQGYLab/ComfyUI-ELLA/assets/11223712/cbb89576-fb48-4ee6-90e0-1a72d3b73ce2)
created this image:![image](https://github.com/TencentQQGYLab/ComfyUI-ELLA/assets/11223712/c64d7b62-6f3a-4df2-b4e8-e4d57cf5e04a)
This setup:![image](https://github.com/TencentQQGYLab/ComfyUI-ELLA/assets/11223712/67b22b5c-bd12-4e94-bee2-34af15e61bc7)
created this image:![image](https://github.com/TencentQQGYLab/ComfyUI-ELLA/assets/11223712/e5dfc9a0-de9d-4b03-9c42-f04634afa6b7)