Did anyone preproduce pix2pix-turbo ?

Lucarqi commented 2 weeks ago

I want to use pix2pix for a medical image generation task, where the control condition is different label images. My hyperparameter settings are as follows: accelerate launch src/train_pix2pix_turbo.py \ --pretrained_model_name_or_path="stabilityai/sd-turbo" \ --output_dir="output/pix2pix_turbo/test" \ --dataset_folder="data/data_u3/pix2pix" \ --resolution=256 \ --num_training_epochs=500 \ --train_batch_size=2 \ --enable_xformers_memory_efficient_attention \ --viz_freq 25 \ --track_val_fid \ --report_to "wandb" \ --tracker_project_name "test"

accelerate config is: compute_environment: LOCAL_MACHINE debug: false distributed_type: MULTI_GPU downcast_bf16: 'no' enable_cpu_affinity: false gpu_ids: all machine_rank: 0 main_training_function: main mixed_precision: 'no' num_machines: 1 num_processes: 3 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false

I found that during training, the output images from the model are always the same and significantly different from the real images.：

And during testing, all synthetic images are also almost identical:

training wandb:

validation wandb:

This issue has been consistently troubling me. Does anyone have any unique insights about it?

GaParmar commented 1 week ago

Looking at your results I can spot a couple issues with your setting:

Since you are doing medical image segmentation, you can probably remove the CLIP_SIM loss. The text-image similarity loss is probably not very useful in this setting.
Maybe your model is underfitting, you can try increasing the rank of the LoRa adapters

Lucarqi commented 1 week ago

@GaParmar thanks for your reply. For 1: At the beginning, I used the default settings, including clip_sim loss, but it also had the same effect. For 2: I will try your advise.

GaParmar / img2img-turbo

Did anyone preproduce pix2pix-turbo ? #93