GaParmar / img2img-turbo

One-step image-to-image with Stable Diffusion turbo: sketch2image, day2night, and more
MIT License
1.57k stars 179 forks source link

Did anyone preproduce pix2pix-turbo ? #93

Open Lucarqi opened 2 weeks ago

Lucarqi commented 2 weeks ago

I want to use pix2pix for a medical image generation task, where the control condition is different label images. My hyperparameter settings are as follows: accelerate launch src/train_pix2pix_turbo.py \ --pretrained_model_name_or_path="stabilityai/sd-turbo" \ --output_dir="output/pix2pix_turbo/test" \ --dataset_folder="data/data_u3/pix2pix" \ --resolution=256 \ --num_training_epochs=500 \ --train_batch_size=2 \ --enable_xformers_memory_efficient_attention \ --viz_freq 25 \ --track_val_fid \ --report_to "wandb" \ --tracker_project_name "test"

accelerate config is: compute_environment: LOCAL_MACHINE debug: false distributed_type: MULTI_GPU downcast_bf16: 'no' enable_cpu_affinity: false gpu_ids: all machine_rank: 0 main_training_function: main mixed_precision: 'no' num_machines: 1 num_processes: 3 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false

I found that during training, the output images from the model are always the same and significantly different from the real images.: image

And during testing, all synthetic images are also almost identical: image image image

training wandb: image

validation wandb: image

This issue has been consistently troubling me. Does anyone have any unique insights about it?

GaParmar commented 1 week ago

Looking at your results I can spot a couple issues with your setting:

  1. Since you are doing medical image segmentation, you can probably remove the CLIP_SIM loss. The text-image similarity loss is probably not very useful in this setting.
  2. Maybe your model is underfitting, you can try increasing the rank of the LoRa adapters
Lucarqi commented 1 week ago

@GaParmar thanks for your reply. For 1: At the beginning, I used the default settings, including clip_sim loss, but it also had the same effect. For 2: I will try your advise.