My code stops at second steps

CharisWg commented 1 month ago

My code stops at second steps:

(img2img-turbo) D:\Charis\img2img-turbo-main>accelerate launch --main_process_port 29501 src/train_cyclegan_turbo.py --pretrained_model_name_or_path="stabilityai/sd-turbo" --output_dir="output/cyclegan_turbo/my_horse2zebra" --dataset_folder="D:/Charis/img2img-turbo-main/data/my_horse2zebra" --train_img_prep="resize_286_randomcrop_256x256_hflip" --val_img_prep="no_resize" --learning_rate="1e-5" --max_train_steps=50 --train_batch_size=1 --gradient_accumulation_steps=1 --report_to="wandb" --tracker_project_name="gparmar_unpaired_h2z_cycle_debug_v2" --enable_xformers_memory_efficient_attention --validation_steps=250 --lambda_gan=0.5 --lambda_idt=1 --lambda_cycle=1 C:\Users\LocalAdmin\anaconda3\envs\img2img-turbo\lib\site-packages\huggingface_hub\file_download.py:1150: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' C:\Users\LocalAdmin\anaconda3\envs\img2img-turbo\lib\site-packages\torch\optim\adamw.py:50: UserWarning: optimizer contains a parameter group with duplicate parameters; in future, this will cause an error; see github.com/pytorch/pytorch/issues/40967 for more information super().init(params, defaults) 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 140/140 [00:00<00:00, 1497.22it/s] Found 280 images in the folder output/cyclegan_turbo/my_horse2zebra\fid_reference_a2b 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:05<00:00, 5.96it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 120/120 [00:00<00:00, 1610.58it/s] Found 240 images in the folder output/cyclegan_turbo/my_horse2zebra\fid_reference_b2a 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:01<00:00, 20.65it/s] Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] C:\Users\LocalAdmin\anaconda3\envs\img2img-turbo\lib\site-packages\torchvision\models_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( C:\Users\LocalAdmin\anaconda3\envs\img2img-turbo\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=VGG16_Weights.IMAGENET1K_V1. You can also use weights=VGG16_Weights.DEFAULT to get the most up-to-date weights. warnings.warn(msg) Loading model from: C:\Users\LocalAdmin\anaconda3\envs\img2img-turbo\lib\site-packages\lpips\weights\v0.1\vgg.pth wandb: Currently logged in as: wcharis1105 (wcharis1105-the-univesity-of-sydney). Use wandb login --relogin to force relogin wandb: Tracking run with wandb version 0.17.7 wandb: Run data is saved locally in D:\Charis\img2img-turbo-main\wandb\run-20240816_223400-tom0o8i5 wandb: Run wandb offline to turn off syncing. wandb: Syncing run hardy-night-1 wandb: View project at https://wandb.ai/wcharis1105-the-univesity-of-sydney/gparmar_unpaired_h2z_cycle_debug_v2 wandb: View run at https://wandb.ai/wcharis1105-the-univesity-of-sydney/gparmar_unpaired_h2z_cycle_debug_v2/runs/tom0o8i5 Steps: 0%| | 0/50 [00:00<?, ?it/s]C:\Users\LocalAdmin\anaconda3\envs\img2img-turbo\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ..\aten\src\ATen\native\TensorShape.cpp:3484.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Steps: 2%|█▉ | 1/50 [00:05<04:25, 5.41s/it]Using cache found in C:\Users\LocalAdmin/.cache\torch\hub\facebookresearch_dino_main 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 120/120 [00:20<00:00, 5.95it/s] Found 240 images in the folder output/cyclegan_turbo/my_horse2zebra\fid-1/samples_a2b███████████████████████| 120/120 [00:20<00:00, 6.37it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:08<00:00, 3.35it/s] step=1, fid(a2b)=336.95, dino(a2b)=0.069██████████████████████████████████████████████████████████████████████| 30/30 [00:08<00:00, 5.32it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 140/140 [00:23<00:00, 5.90it/s] Found 280 images in the folder output/cyclegan_turbo/my_horse2zebra\fid-1/samples_b2a███████████████████████| 140/140 [00:23<00:00, 6.28it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:01<00:00, 18.60it/s] step=1, fid(b2a)=275.9149222126299, dino(b2a)=0.084████████████████████████████████████████████████████████ | 34/35 [00:01<00:00, 18.12it/s] Steps: 149it [12:57, 5.22s/it, cycle_a=6.43, cycle_b=5.78, disc_a=1.72, disc_b=1.36, gan_a=1.11, gan_b=1.47, idt_a=0.673, idt_b=0.653] wandb: | 3.673 MB of 3.673 MB uploaded wandb: Run history: wandb: cycle_a ▄█▆▆█▄▇█▆▆▆█▆▅▄▆▄█▅▆▃▁▇▄▅▆▅▄▅▄▄▅▆▄▅▅▄▅▃▁ wandb: cycle_b ▆▅▅█▇▆▅▂▄▄▆▃▄▅▄▅▃▃▄▂▄▄▅▃▂▄▃▃▄▄▃▃▃▂▃▅▃▁▂▂ wandb: disc_a █▇█▇▇▇▆▇▆▅▆▅▅▄▅▅▆▅▄▃▃▆▃▄▄▅▄▃▃▄▄▅▁▄▅▄▅▂▄▂ wandb: disc_b ▆▆▆▆▅▅▄▄▄▄▅▅▃▄▄▄▄▁▃▅▄▃▅▄▁▄▂▂▃▃▄▂▄▄▂▁▄▄▆█ wandb: gan_a ▁▁▁▂▁▂▂▃▂▃▂▂▄▃▄▂▃▁▂▅▄▅▃▂▆▁▄▃▄▂▄▅▆█▁▆▃▆▃▆ wandb: gan_b ▃▃▃▄▄▄▄▄▄▄▃▄▄▅▅▅▆▆▄▇▇▅▃▅█▄▄▇▇▃▃▃▃█▄▅▄▂▁▁ wandb: idt_a ▅▇▅█▆▇▄▃▄▅▇▄▆▆▅▅▃▂▄▂▄▄▅▃▃▄▂▃▃▃▄▃▂▂▁▄▁▁▁▁ wandb: idt_b ▇█▆▇▇▅██▇▆▆▆▇▇▄▆▅▇▆▅▄▃▆▄▅▅▄▃▄▄▄▃▄▄▄▄▃▃▂▁ wandb: val/dino_struct_a2b ▁ wandb: val/dino_struct_b2a ▁ wandb: val/fid_a2b ▁ wandb: val/fid_b2a ▁ wandb: wandb: Run summary: wandb: cycle_a 6.43207 wandb: cycle_b 5.78296 wandb: disc_a 1.71799 wandb: disc_b 1.36104 wandb: gan_a 1.10708 wandb: gan_b 1.47028 wandb: idt_a 0.67274 wandb: idt_b 0.65303 wandb: val/dino_struct_a2b 0.06939 wandb: val/dino_struct_b2a 0.08398 wandb: val/fid_a2b 336.9454 wandb: val/fid_b2a 275.91492 wandb: wandb: View run hardy-night-1 at: https://wandb.ai/wcharis1105-the-univesity-of-sydney/gparmar_unpaired_h2z_cycle_debug_v2/runs/tom0o8i5 wandb: View project at: https://wandb.ai/wcharis1105-the-univesity-of-sydney/gparmar_unpaired_h2z_cycle_debug_v2 wandb: Synced 6 W&B file(s), 48 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: .\wandb\run-20240816_223400-tom0o8i5\logs wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with wandb.require("core")! See https://wandb.me/wandb-core for more information.

![Uploading Screenshot 2024-08-18 143600.png…]()

GaParmar commented 3 weeks ago

Does the training end early because you set the max_train_steps=50 in the training command? This issue might be resolved if you increase this to a larger value.

-Gaurav

CharisWg commented 1 week ago

ps=50 in the training command? This issue might be resolved if you increase this to a larger value

thanks. I solved this

GaParmar / img2img-turbo

My code stops at second steps #79