About dataset preparing

nldhuyen0047 commented 2 months ago

Hi @GaParmar, I have a question about dataset preparing. In the training stage, with the size of A folder and B folder, must be the same, right?

Thank you so much.

GaParmar commented 1 month ago

If you are training the unpaired model, the size of A and B folders do not need to be the same.

nldhuyen0047 commented 1 month ago

Yes, I trained with the unpaired model.

When I use the dataset with the different sizes, I have encountered with this error:

File "/home/dev_ml/img2img-turbo/src/model.py", line 42, in my_vae_decoder_fwd sample = sample + skip_in RuntimeError: The size of tensor a (408) must match the size of tensor b (409) at non-singleton dimension 2

I fixed the dataset with the same size of A and B, and all images in each folder, then I ran the code again, it activated. Could you please explain to me this error?

Thank you so much.

GaParmar commented 1 month ago

Based on this error, it looks like your image resolution is not a multiple of 8. What data-preprocessing are you using?

-Gaurav

nldhuyen0047 commented 1 month ago

I have prepared my dataset and trained the model and I have encountered this problem.

I have another question about the training model. Because of the previous error, I trained with the dataset size 512x512, the result after training did not translate from the A dataset into the B dataset. Are there any errors with the model?

Thank you so much.

GaParmar commented 1 month ago

I think this is a problem that occurs because of mixed precision training. Could you try training without mixed precision?

nldhuyen0047 commented 1 month ago

Hi, I tried again with no mixed precision, the translation is active. I have tried with the other dataset with no resize, but my memory is not enough, so I resize the images in the dataset. Then, I have encountered another problem, my training code stopped after several steps (800/25000), but I choose the max_training_step is 25000, and the result is not good. My training dataset is about 8 images in the train and 7 images in the test. This is my command line:

export NCCL_P2P_DISABLE=1 accelerate launch --main_process_port 29501 src/train_cyclegan_turbo.py \ --pretrained_model_name_or_path="stabilityai/sd-turbo" \ --output_dir="outputs/test" \ --dataset_folder "data/test" \ --train_img_prep "resize_286_randomcrop_256x256_hflip" --val_img_prep "no_resize" \ --learning_rate="1e-5" --max_train_steps=25000 \ --train_batch_size=1 --gradient_accumulation_steps=1 \ --report_to "wandb" --tracker_project_name "gparmar_unpaired_h2z_cycle_debug_v2" \ --enable_xformers_memory_efficient_attention --validation_steps 250 \ --lambda_gan 0.5 --lambda_idt 1 --lambda_cycle 1

Thank you so much.

GaParmar commented 1 month ago

Glad to hear that the training is working now! The reason why your training stopped is because by the default the max number of epochs is set to 100 as in this line. Since your dataset has 8 images, this corresponds to 800 steps. You should also increase the number of epochs by setting the flag --max_train_epochs to higher value to train longer!

-Gaurav

GaParmar / img2img-turbo

About dataset preparing #84