ExponentialML / ComfyUI_ELLA

ComfyUI Implementaion of ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Apache License 2.0
153 stars 10 forks source link

is every single model from google/flan-t5-xl really necessary? #4

Open spammeduh opened 1 month ago

spammeduh commented 1 month ago

There's multiple 9GB+ files on that repo and it is using a good chunk of space on my hard drive, would it be enough with the safetensors?

hben35096 commented 1 month ago

Is it possible to use the bf16 version? https://huggingface.co/ybelkada/flan-t5-xl-sharded-bf16/tree/main

budui commented 1 month ago

ELLA only need Flan-T5 XL Encoder(fp16), ~2.8G

ClockworkCreep commented 1 month ago

ELLA only need Flan-T5 XL Encoder(fp16), ~2.8G

Is this available for download anywhere? Edit: https://huggingface.co/limcheekin/flan-t5-xl-ct2/tree/main Is this it? The size matches, but it doesn't seem to work with this node.

kijai commented 1 month ago

Couldn't find the encoder only, so did that myself: https://huggingface.co/Kijai/flan-t5-xl-encoder-only-bf16/tree/main with original:

ella_t5_full with pruned:

ella_t5_encoder_only_bf16

saves quite a bit of space indeed:

image image

quixot1c commented 1 month ago

@kijai your node seems to work better than this node, but seems less native to ComfyUI. Why does your node work better and am I correct in my assessment that this node is more Comfy-ish in its implementation?

kijai commented 1 month ago

@kijai your node seems to work better than this node, but seems less native to ComfyUI. Why does your node work better and am I correct in my assessment that this node is more Comfy-ish in its implementation?

Mine if just a wrapper for the original code, so not compatible with anything in Comfy, while this node here only creates the embeds and passes them to Comfy just like CLIP text encode would, making it native.

I too have noticed this isn't working like the original, I think something in how ComfyUI handles the conditioning is causing the difference.

drphero commented 1 month ago

@kijai Thanks for creating the smaller model. Some people's cards (mine included) don't support bf16, so could you also do fp16 and/or briefly explain how you did it? I'm actually quite curious about the process used.

I first tried to reproduce your bf16 version with the code I whipped up below, but the file size doesn't match.

import torch

part1 = torch.load('pytorch_model-00001-of-00002.bin', map_location='cuda:0')
part2 = torch.load('pytorch_model-00002-of-00002.bin', map_location='cuda:0')

combined_model = {**part1, **part2}

encoder_blocks = {key: val for key, val in combined_model.items() if key.startswith('encoder.')}

encoder_blocks_bf16 = {key: val.bfloat16() for key, val in encoder_blocks.items()}

torch.save(encoder_blocks_bf16, 'model_bf16.bin')
kijai commented 1 month ago

@kijai Thanks for creating the smaller model. Some people's cards (mine included) don't support bf16, so could you also do fp16 and/or briefly explain how you did it? I'm actually quite curious about the process used.

I first tried to reproduce your bf16 version with the code I whipped up below, but the file size doesn't match.

import torch

part1 = torch.load('pytorch_model-00001-of-00002.bin', map_location='cuda:0')
part2 = torch.load('pytorch_model-00002-of-00002.bin', map_location='cuda:0')

combined_model = {**part1, **part2}

encoder_blocks = {key: val for key, val in combined_model.items() if key.startswith('encoder.')}

encoder_blocks_bf16 = {key: val.bfloat16() for key, val in encoder_blocks.items()}

torch.save(encoder_blocks_bf16, 'model_bf16.bin')

I was under the impression it would still cast the weights to fp16 for inference, you tried it and it didn't work?

I did it pretty lazily, I just inserted this saving in the original transformers loading code: image

Then rename that to model.safetensors and include the original configs in the repo.

drphero commented 1 month ago

@kijai Thanks for the info. And yeah, the bf16 one did end up working, which I was surprised by because other bf16 models failed before.