bghira / SimpleTuner

A general fine-tuning kit geared toward diffusion models.
GNU Affero General Public License v3.0
1.79k stars 170 forks source link

full finetuning FLUX #852

Closed youngwanLEE closed 2 months ago

youngwanLEE commented 2 months ago

Hi, I'm impressed by your excellent work.

I wonder if this codebase for flux allows us to fully fine-tune flux.

bghira commented 2 months ago

yes. MODEL_TYPE='full'. but you will need DeepSpeed configured. see the documentation

youngwanLEE commented 2 months ago

@bghira Thank you for your quick reply.

I already set MODEL_TYPE='full'

However, I faced this error message.

  File "/Data3/SimpleTuner/train.py", line 2526, in <module>
    main()
  File "/Data3/SimpleTuner/train.py", line 2448, in main
    pipeline = sdxl_pipeline_cls.from_pretrained(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Data3/SimpleTuner/.venv/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Data3/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/pipelines/pipeline_utils.py", line 966, in from_pretrained
    model = pipeline_class(**init_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Data3/SimpleTuner/helpers/sdxl/pipeline.py", line 298, in __init__
    self.default_sample_size = self.unet.config.sample_size
                               ^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'config'

This is my config.

RESUME_CHECKPOINT='latest' DATALOADER_CONFIG='config/multidatabackend.json' ASPECT_BUCKET_ROUNDING='2' TRAINING_SEED='42' USE_EMA='false' USE_XFORMERS='false' MINIMUM_RESOLUTION='0' OUTPUT_DIR='ouptut' USE_DORA='false' USE_BITFIT='false' PUSH_TO_HUB='false' PUSH_CHECKPOINTS='false' NUM_EPOCHS='100' MAX_NUM_STEPS='0' CHECKPOINTING_STEPS='2000' CHECKPOINTING_LIMIT='10' HUB_MODEL_NAME='simpletuner-full' TRACKER_PROJECT_NAME='simpletuner-flux-dev' TRACKER_RUN_NAME='flux-dev' DEBUG_EXTRA_ARGS='--report_to=wandb' MODEL_TYPE='full' MODEL_NAME='black-forest-labs/FLUX.1-dev' FLUX='true' PIXART_SIGMA='false' KOLORS='false' STABLE_DIFFUSION_3='false' STABLE_DIFFUSION_LEGACY='false' TRAIN_BATCH_SIZE='1' USE_GRADIENT_CHECKPOINTING='true' GRADIENT_ACCUMULATION_STEPS='2' CAPTION_DROPOUT_PROBABILITY='0.1' RESOLUTION_TYPE='pixel_area' RESOLUTION='1024' VALIDATION_SEED='0' VALIDATION_STEPS='2000' VALIDATION_RESOLUTION='1024x1024' VALIDATION_GUIDANCE='3.0' VALIDATION_GUIDANCE_RESCALE='0.0' VALIDATION_NUM_INFERENCE_STEPS='25' VALIDATION_PROMPT='A photo-realistic image of a cat' ALLOW_TF32='true' MIXED_PRECISION='bf16' OPTIMIZER='adamw_bf16' LEARNING_RATE='1e-6' LR_SCHEDULE='polynomial' LR_WARMUP_STEPS='10' ACCELERATE_EXTRA_ARGS='--multi_gpu' TRAINING_NUM_PROCESSES='8' TRAINING_NUM_MACHINES='1' VALIDATION_TORCH_COMPILE='true' TRAINER_DYNAMO_BACKEND='inductor' TRAINER_EXTRA_ARGS='--lr_end=1e-8 --compress_disk_cache'

bghira commented 2 months ago

this is seen at the end of training?

youngwanLEE commented 2 months ago

it seems weird. It reached the last epoch without any processing and finished with the results.

I prepared the laion_pop dataset and succeeded in making vae/text caches.

image
bghira commented 2 months ago

share your dataloader config

youngwanLEE commented 2 months ago

@bghira

[ { "id": "laion_ye_pop", "type": "local", "instance_data_dir": "datasets/laion_ye_pop_pairs", "crop": true, "crop_style": "center", "crop_aspect": "square", "minimum_image_size": 1024, "maximum_image_size": 1536, "target_downsample_size": 1024, "resolution": 1024, "resolution_type": "pixel_area", "caption_strategy": "textfile", "cache_dir_vae": "cache/vae/flux/laion_ye_pop", "text_embeds": "laion-ye-pop-embed-cache", "ignore_epochs": true, "disabled": false, "skip_file_discovery": "", "metadata_backend": "json" }, { "id": "laion-ye-pop-embed-cache", "dataset_type": "text_embeds", "default": true, "type": "local", "cache_dir": "cache/text/flux/laion_ye_pop", "disabled": false, "write_batch_size": 128 } ]

bghira commented 2 months ago

ok it's ignore_epochs being enabled.

youngwanLEE commented 2 months ago

@bghira I tried to follow your instruction. Then, should I change ignore_epochs to false?

bghira commented 2 months ago

yes that requires two datasets and it's specific to single-subject dreambooth

youngwanLEE commented 2 months ago

@bghira Thanks to your comment, I succeeded in starting to training the full fine-tuning. However, it finally results in out-of-memory in the 8xA100 GPU machine. Although I set the batch sizer per GPU to 1, this happened.

FYI, when I tried to train full-finetuning flux using XLabs-AI/x-flux with a batch size of 1 and 1 GPU setting(not 8 GPUs), It could normally run even without vae/text caching.

I expected that since this SimpleTuner codebase performs caching vae/text embeddings, it could reduce memory usages , but It didn't actually.

I cannot figure which points I missed or which part consumes more memory compared to XLabs-AL/x-flux.

bghira commented 2 months ago

then deepspeed not set up

youngwanLEE commented 2 months ago

@bghira I really appreciate your reply.

I finally run it with deepspeed setting.

I added --config_file="config/accelerate_config.yaml in the train.sh file.

compute_environment: LOCAL_MACHINE debug: false
deepspeed_config: gradient_accumulation_steps: 2 gradient_clipping: 1.0 offload_optimizer_device: none offload_param_device: none zero3_init_flag: false zero_stage: 2 distributed_type: DEEPSPEED downcast_bf16: 'no' enable_cpu_affinity: false machine_rank: 0 main_training_function: main mixed_precision: bf16 num_machines: 1 num_processes: 8 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false

bghira commented 2 months ago

no you use accelerate config and should populate the default config. then dont use --multi_gpu

youngwanLEE commented 2 months ago

@bghira I struggled with DeepSpeed because I wasn't familiar with it, but now I've figured it out.

I wrote some notes for those who will try full fine-tuning with DeepSpeed but are not familiar with it.

Just simply run accelerate config in the command and follow the settings as in this guide and X-Lab's issue.

youngwanLEE commented 2 months ago

@bghira I have another question.

When I checked the vae & text cached files, the numbers of samples are different.

For example, the number of original images: 491,567 the number of vae: 212,556 the number of text embedding: 491,248

Is this situation normal? Anyway, the full-finetuning has been performed normally even using this cached files.

bghira commented 2 months ago

maybe aspect buckets dont have enough samples and a bunch are deleted. you can try --disable_bucket_pruning

playerzer0x commented 2 months ago

RESUME_CHECKPOINT='latest' DATALOADER_CONFIG='config/multidatabackend.json' ASPECT_BUCKET_ROUNDING='2' TRAINING_SEED='42' USE_EMA='false' USE_XFORMERS='false' MINIMUM_RESOLUTION='0' OUTPUT_DIR='ouptut' USE_DORA='false' USE_BITFIT='false' PUSH_TO_HUB='false' PUSH_CHECKPOINTS='false' NUM_EPOCHS='100' MAX_NUM_STEPS='0' CHECKPOINTING_STEPS='2000' CHECKPOINTING_LIMIT='10' HUB_MODEL_NAME='simpletuner-full' TRACKER_PROJECT_NAME='simpletuner-flux-dev' TRACKER_RUN_NAME='flux-dev' DEBUG_EXTRA_ARGS='--report_to=wandb' MODEL_TYPE='full' MODEL_NAME='black-forest-labs/FLUX.1-dev' FLUX='true' PIXART_SIGMA='false' KOLORS='false' STABLE_DIFFUSION_3='false' STABLE_DIFFUSION_LEGACY='false' TRAIN_BATCH_SIZE='1' USE_GRADIENT_CHECKPOINTING='true' GRADIENT_ACCUMULATION_STEPS='2' CAPTION_DROPOUT_PROBABILITY='0.1' RESOLUTION_TYPE='pixel_area' RESOLUTION='1024' VALIDATION_SEED='0' VALIDATION_STEPS='2000' VALIDATION_RESOLUTION='1024x1024' VALIDATION_GUIDANCE='3.0' VALIDATION_GUIDANCE_RESCALE='0.0' VALIDATION_NUM_INFERENCE_STEPS='25' VALIDATION_PROMPT='A photo-realistic image of a cat' ALLOW_TF32='true' MIXED_PRECISION='bf16' OPTIMIZER='adamw_bf16' LEARNING_RATE='1e-6' LR_SCHEDULE='polynomial' LR_WARMUP_STEPS='10' ACCELERATE_EXTRA_ARGS='--multi_gpu' TRAINING_NUM_PROCESSES='8' TRAINING_NUM_MACHINES='1' VALIDATION_TORCH_COMPILE='true' TRAINER_DYNAMO_BACKEND='inductor' TRAINER_EXTRA_ARGS='--lr_end=1e-8 --compress_disk_cache'

Could you share what settings/command ultimately worked for you? I'm getting this error using your settings:

2024-09-09 20:18:09,066 [INFO] Moving the diffusion transformer to GPU in torch.bfloat16 precision.
2024-09-09 20:18:13,153 [INFO] Learning rate: 1e-06
2024-09-09 20:18:13,154 [INFO] cls: <class 'accelerate.utils.deepspeed.DummyOptim'>, settings: {'lr': 1e-06, 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0.01}
'Trainer' object has no attribute 'optimizer'
2024-09-09 20:18:13,156 [INFO] Using dummy learning rate scheduler
'Trainer' object has no attribute 'optimizer'
Traceback (most recent call last):
  File "/workspace/SimpleTuner/train.py", line 48, in <module>
    trainer.resume_and_prepare()
  File "/workspace/SimpleTuner/helpers/training/trainer.py", line 1354, in resume_and_prepare
    lr_scheduler = self.init_lr_scheduler()
                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/SimpleTuner/helpers/training/trainer.py", line 997, in init_lr_scheduler
    self.optimizer,
    ^^^^^^^^^^^^^^
AttributeError: 'Trainer' object has no attribute 'optimizer'

Traceback (most recent call last):
  File "/workspace/SimpleTuner/train.py", line 48, in <module>
    trainer.resume_and_prepare()
  File "/workspace/SimpleTuner/helpers/training/trainer.py", line 1354, in resume_and_prepare
    lr_scheduler = self.init_lr_scheduler()
                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/SimpleTuner/helpers/training/trainer.py", line 997, in init_lr_scheduler
    self.optimizer,
    ^^^^^^^^^^^^^^
AttributeError: 'Trainer' object has no attribute 'optimizer'