Closed GongXinyuu closed 4 months ago
could be - i don't have the resources to be able to check at the moment. but are you trying to do training with compel? you probably shouldn't do that.
could be - i don't have the resources to be able to check at the moment. but are you trying to do training with compel? you probably shouldn't do that.
Yes. I'm trying to incorporate compel into diffusion model lora training. May I know the reason why it shouldn't be used for training? Thanks.
@GongXinyuu if you train with weighted captions, you'll possibly produce a model that will only respond to prompts that have been weighted the same way, but more likely a model that is just harder to use
@GongXinyuu if you train with weighted captions, you'll possibly produce a model that will only respond to prompts that have been weighted the same way, but more likely a model that is just harder to use
Gotcha. The main reason I want to incorporate compel into model training is to mitigate the 77 token limitation brought by CLIP, as I want to use more detailed prompt to train DM just like DALLE3. Guess it should be fine if I don't use the prompt weighting function?
i'd suggest limiting the captions to 77 tokens, for similar reasons but mostly because compel is not at all aware of word boundaries and that means that you will end up training your model on broken text encoder data. the >77 token trick is an even uglier and more unreliable hack than the prompt weighting.
Hi @damian0815 , thanks for your great work! Recently I've tried to incorporate compel in to https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora_sdxl.py
However, I have found there might be an computational error when running compel with text_encoder in torch.float16 dtype. Below is a full code snippet to reproduce it.
The above code will throw an error:
I found that if
is removed, then it works perfectly well.