bmaltais / kohya_ss

Apache License 2.0
9.33k stars 1.21k forks source link

Flux CLIP_L Text Encoder training support. #2738

Open dsienra opened 3 weeks ago

dsienra commented 3 weeks ago

Flux LORA training is bleeding concepts, if you train many people at the same time get all mixed, is ignoring the unique tokens assigned to each character, I think the problem is that is training only the unet and ignoring the captions completely, to fix this the text encoder must to be trained and saved. T5 is too big to train and it can keep frozen, but CLIP_L can be trained and it probably fix the issue. Any news about this?

Thanks.

bmaltais commented 3 weeks ago

Kohya is already aware… but training T5 will require a lot of VRAM and will most certainly push the limits of 24gb GPUs

dsienra commented 3 weeks ago

Kohya is already aware… but training T5 will require a lot of VRAM and will most certainly push the limits of 24gb GPUs

Thanks for your response, that is because I'm only talking about CLIP_L not the T5. Training only CLIP_L may be posible with 24gb GPUs

flamed0g commented 2 weeks ago

@bmaltais CLIP-L training has been released for sd-scripts

dsienra commented 2 weeks ago

great, thanks!!!