Closed Scorbinwen closed 1 month ago
I concat the CLIPTextEncoder Embedding to Ella as the trick mentioned in https://github.com/TencentQQGYLab/ComfyUI-ELLA/issues/29#issuecomment-2082288059, and add negative prompt only,the result quality is much better:
so the real reason maybe because Ella just don't know about the negative prompt like "EasyNegative,paintings,sketches,(worst quality:2),(low quality:2),(normal quality:2),lowres,normal quality,(monochrome:1),(grayscale:1),skin spots,acnes,skin blemishes,age spot,glans,extra fingers,fewer fingers,multiple hands,multiple heads,Multiple arms,"
non-Ella workflow result combined with ipadapter:
Ella workflow result combined with ipadapter:
ipadapter reference style image:
the Ella workflow image result is far less realistic than non-Ella workflow when the reference style image mostly contains unrealistic textures. I guess this is somewhat because the ipadapter's overfiting, the image prompt is too strong that it overpass the text prompt, but the non-Ella workflow result is realistic enough, they use the same ipadapter and the same hyperparameters, so I wonder if it's beacause Ella's output condition embedding does not matches with the origin SD's Unet as well as CLIPTextEncoder's output condition embeding. I wonder if there's simple trick to tackle this problem? free-training solution is better~
@JettHu