bmaltais / kohya_ss

Apache License 2.0
9.6k stars 1.24k forks source link

FLUX.1 Finetuning #2735

Open CognitiveDiffusion opened 2 months ago

CognitiveDiffusion commented 2 months ago

So I'm not sure if FineTuning is still more WIP than DB (FLUX branch), but it 1) Does not find images in subfolders 2) Is missing most Bucket Options at the moment.

WarAnakin commented 2 months ago

are you using the finetuning tab or dreambooth ?

image

CognitiveDiffusion commented 2 months ago

As I wrote, I tried both. I didn't manage to get FineTuning running at all.

WarAnakin commented 2 months ago

neither have I, but on my end it did not have a problem finding the dataset. The problem is when it's supposed to do the training, it throws and error saying that there's version mismatch between the weights' version (fp32, fp16) vs selected precision for training. I tried loading different pre-trained model versions and other things, i even changed the code but to no avail

bmaltais commented 2 months ago

Are you using fp16 versions of all model with bf16 precision everywhere?

WarAnakin commented 2 months ago

Are you using fp16 versions of all model with bf16 precision everywhere?

I'll send you a screenshot next time i boot it up, in a few hours.

bmaltais commented 2 months ago

Best would be to share the config.json file

WarAnakin commented 2 months ago

Best would be to share the config.json file otaku-anime_flux_db_vR-test.json

image

peteallen commented 2 months ago

Any progress on this? I've seen discussion of people finetuning flux.1-dev with 24 GB VRAM, but have only been able to get LoRA training to work so far. Attempting to start training under the Finetuning tab throws an error when trying to load the VAE (same VAE that works for LoRA training in kohya-ss and ai-toolkit).

I'm wondering if I'm wasting my time trying to troubleshoot this right now, or if others have actually gotten it to work.

carat-keeeehun commented 2 months ago

WarAnakin

May I ask about model(flux1DFp16_v10.safetensors) in your otaku-anime_flux_db_vR-test.json file? I wonder how can I download this model.

WarAnakin commented 2 months ago

Any progress on this? I've seen discussion of people finetuning flux.1-dev with 24 GB VRAM, but have only been able to get LoRA training to work so far. Attempting to start training under the Finetuning tab throws an error when trying to load the VAE (same VAE that works for LoRA training in kohya-ss and ai-toolkit).

I'm wondering if I'm wasting my time trying to troubleshoot this right now, or if others have actually gotten it to work.

@peteallen One thing you should be aware of, is the level of hypocrisy some people exhume. Don't worry, they are not using the dreambooth tab, they dont know something you dont, they just merge a lora in their model and claim it as trained and call it a day.

@carat-keeeehun You can't really download that since it's a model I created for a client.

carat-keeeehun commented 2 months ago

Any progress on this? I've seen discussion of people finetuning flux.1-dev with 24 GB VRAM, but have only been able to get LoRA training to work so far. Attempting to start training under the Finetuning tab throws an error when trying to load the VAE (same VAE that works for LoRA training in kohya-ss and ai-toolkit). I'm wondering if I'm wasting my time trying to troubleshoot this right now, or if others have actually gotten it to work.

@peteallen One thing you should be aware of, is the level of hypocrisy some people exhume. Don't worry, they are not using the dreambooth tab, they dont know something you dont, they just merge a lora in their model and claim it as trained and call it a day.

@carat-keeeehun You can't really download that since it's a model I created for a client.

Thank you for reply. Can I ask one more question? I wonder that when training flux lora, the base model of flux lora should be unet or checkpoints.

WarAnakin commented 2 months ago

@carat-keeeehun I apologize, i think i misunderstood your question. Did you mean this ? image

That's an fp16 version of the flux.dev model, you may download it from here: https://www.dropbox.com/scl/fi/szyypeg34mz0ktklw6exx/flux1DFp16_v10.safetensors?rlkey=2ictyj7oxb27v6upblwjqlrgx&st=71ynjfpm&dl=0

diodiogod commented 2 months ago

Don't worry, they are not using the dreambooth tab, they dont know something you dont

Well this guy (NSFW model) claims he finetuned a model with a 3090. He even showed the difference between his finetune and his LoRa... and posted his settings... I still have to try it.

carat-keeeehun commented 2 months ago

Best would be to share the config.json file otaku-anime_flux_db_vR-test.json

image

I also met this issue, NotImplementedError: Cannot copy out of meta tensor; no data!, whenever I use integrated flux-dev model as base model, instead of UNet flux-dev model. Did you solve this issue??

WarAnakin commented 2 months ago

Best would be to share the config.json file otaku-anime_flux_db_vR-test.json

image

I also met this issue, NotImplementedError: Cannot copy out of meta tensor; no data!, whenever I use integrated flux-dev model as base model, instead of UNet flux-dev model. Did you solve this issue??

yes, the change the vae from FluxDevVAE to ae.sft Download from here: https://www.dropbox.com/scl/fi/px4130up5shq512ff9mjt/ae.sft?rlkey=gmuqz07rojgi0w4kv28bpsamz&st=q61h2cdy&dl=0 image

carat-keeeehun commented 2 months ago

Best would be to share the config.json file otaku-anime_flux_db_vR-test.json

image

I also met this issue, NotImplementedError: Cannot copy out of meta tensor; no data!, whenever I use integrated flux-dev model as base model, instead of UNet flux-dev model. Did you solve this issue??

yes, the change the vae from FluxDevVAE to ae.sft Download from here: https://www.dropbox.com/scl/fi/px4130up5shq512ff9mjt/ae.sft?rlkey=gmuqz07rojgi0w4kv28bpsamz&st=q61h2cdy&dl=0 image

Thank you for your kind reply, but I still tied in this issue.. I have 2 question in your otaku-anime_flux_db_vR-test.json.

  1. Setting Lora type as Standard not Flux1 is possible?
  2. Setting Text Encoder learning rate with not zero is possible? image image
WarAnakin commented 2 months ago

@carat-keeeehun

  1. If you are training a FLUX lora, then the lora type needs to be flux
  2. Just recently, clip_l was enabled, so now you can se you text encoder lr to something else than 0.
BenDes21 commented 2 months ago

Best would be to share the config.json file otaku-anime_flux_db_vR-test.json

image

I also met this issue, NotImplementedError: Cannot copy out of meta tensor; no data!, whenever I use integrated flux-dev model as base model, instead of UNet flux-dev model. Did you solve this issue??

yes, the change the vae from FluxDevVAE to ae.sft Download from here: https://www.dropbox.com/scl/fi/px4130up5shq512ff9mjt/ae.sft?rlkey=gmuqz07rojgi0w4kv28bpsamz&st=q61h2cdy&dl=0 image

Thank you for your kind reply, but I still tied in this issue.. I have 2 question in your otaku-anime_flux_db_vR-test.json.

  1. Setting Lora type as Standard not Flux1 is possible?
  2. Setting Text Encoder learning rate with not zero is possible?

image image

Any update ? Did you successfully finetune flux ?

WarAnakin commented 2 months ago

@BenDes21 Yes, i was able to train a full fledged BF16 model with medium quality parameters. These are some images from the finished product.

ComfyUI-14-00-37-826383853540952-00290 ComfyUI-01-05-43-826383853540952-00022 ComfyUI-01-04-18-826383853540952-00019

Moving on to FP32 high quality