GaParmar / img2img-turbo

One-step image-to-image with Stable Diffusion turbo: sketch2image, day2night, and more
MIT License
1.61k stars 184 forks source link

train_pix2pix the result is not good #82

Open 555jion opened 2 months ago

555jion commented 2 months ago

Dear author, I greatly admire your research. I am a graduate student from Beijing Forestry University in China, and I would like to apply it to the field of terrain generation in images. But I encountered some problems in implementing the paired training model in this paper, and I hope you can continue to advise me. Here are my relevant configurations and training process. It seems that the loss function has not converged and the generated results are not satisfactory. accelerate_config: compute_environment: LOCAL_MACHINE debug: false distributed_type: 'NO' downcast_bf16: 'no' enable_cpu_affinity: true gpu_ids: all machine_rank: 0 main_training_function: main mixed_precision: bf16 num_machines: 1 num_processes: 1 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false

accelerate launch src/train_pix2pix_turbo.py \ --pretrained_model_name_or_path="stabilityai/sd-turbo" \ --output_dir="output/pix2pix_turbo/fill50k" \ --dataset_folder="data/my_fill50k" \ --resolution=512 \ --train_batch_size=1 \ --enable_xformers_memory_efficient_attention --viz_freq 25 \ --track_val_fid \ --report_to "w --tracker_project_name "pix2pix_turbo_fill50k" Screenshot from 2024-08-28 10-58-46

Screenshot from 2024-08-28 10-58-59

Screenshot from 2024-08-28 10-59-11

GaParmar commented 2 months ago

Hi,

I think your training results look worse because of two things: precision="bf16" and batch_size=1. Could you try making the following changes to your setting: -make sure you are using the latest version of the training code. -increase the batch_size to 2 -do not use bf16 mixed precision training

-Gaurav

555jion commented 1 month ago

Thank you for your reply. I will reconfigure and train according to your suggestion, but I have configured it to always burst memory when batch_size set to 2. I would like to inquire about your specific configuration.

Lucarqi commented 4 weeks ago

@555jion hi, I encountered the same issue. The generated quality is very poor. I set BS=2, but in the validation set, each group of images is very similar, and in the test set, all the images are identical. This confuses me a lot. Have you solved this problem? Training: image image On validation set: image Testing(generation): image image image image

555jion commented 1 week ago

@555jion hi, I encountered the same issue. The generated quality is very poor. I set BS=2, but in the validation set, each group of images is very similar, and in the test set, all the images are identical. This confuses me a lot. Have you solved this problem? Training: image image On validation set: image Testing(generation): image image image image

@Lucarqi Hello, it seems that the loss function is converging normally, but it is true that your Fid function is oscillating. Is this related to your text input? I am just guessing that my situation is slightly different from yours.