Closed shiluooulihs closed 3 years ago
Did you use distributed training? There's some problem with it and it's currently under review.
I have the same problem as this. Pre-processed the entire FaceForensics dataset into the same format as the examples given, and trained with the following script:
python train.py --name face_256 --dataset_mode fewshot_face --adaptive_spade --warp_ref --spade_combine --batchSize 4 --continue_train
And got the same problem
I'm using Pytorch 1.5.0 and turned distributed training off.
Any advice?
@tcwang0509 I only use one gpu(rtx 2080Ti) with batchsize 4, so i don't use distributed training.
I have tried pytorch1.0.0 and pytorch1.2.0(suggested version in readme), the problem still exists.
Could you share the loss curve and intermediate result images at different stage of you trained model?
I ran into the same issue using a custom dataset. I reduced the batch size to 2 and --loadSize and --fineSize to 128 instead of 256. It seems to be working for me now, albeit with a lower resolution. I'm not 100% sure which of the above changes solved the problem but it might be worth a try!
hey can you share the trained model? I meet the problem :#64.
(epoch: 5, iters: 5200, time: 0.215) G_GAN: 1.148 G_GAN_Feat: 1.575 G_VGG: 3.786 F_Flow: 47.920 F_Warp: 1.578 F_Mask: 3.744 D_real: 1.216 D_fake: 0.456
(epoch: 5, iters: 5400, time: 0.213) G_GAN: 1.206 G_GAN_Feat: 1.475 G_VGG: 3.834 F_Flow: 42.423 F_Warp: 1.231 F_Mask: 3.050 D_real: 1.006 D_fake: 0.775
(epoch: 5, iters: 5600, time: 0.214) G_GAN: 1.607 G_GAN_Feat: 1.443 G_VGG: 3.900 F_Flow: 38.237 F_Warp: 1.260 F_Mask: 3.218 D_real: 1.071 D_fake: 0.600
(epoch: 5, iters: 5800, time: 0.253) G_GAN: 7.229 G_GAN_Feat: 5.726 G_VGG: 9.282 F_Flow: 61.446 F_Warp: 1.748 F_Mask: 3.956 D_real: 0.149 D_fake: 0.329
(epoch: 5, iters: 6000, time: 0.272) G_GAN: 4.416 G_GAN_Feat: 4.335 G_VGG: 10.591 F_Flow: 54.107 F_Warp: 1.712 F_Mask: 3.758 D_real: 0.028 D_fake: 0.001
saving the latest model (epoch 5, total_steps 46000)
(epoch: 5, iters: 6200, time: 0.250) G_GAN: 6.598 G_GAN_Feat: 6.497 G_VGG: 12.958 F_Flow: 54.509 F_Warp: 1.640 F_Mask: 3.585 D_real: 0.045
(epoch: 5, iters: 6400, time: 0.274) G_GAN: 7.168 G_GAN_Feat: 5.117 G_VGG: 9.567 F_Flow: 41.135 F_Warp: 1.234 F_Mask: 3.038 D_real: 0.001 D_fake: 0.000
(epoch: 5, iters: 6600, time: 0.249) G_GAN: 8.530 G_GAN_Feat: 5.387 G_VGG: 12.317 F_Flow: 42.177 F_Warp: 1.501 F_Mask: 3.491 D_real: 0.001 D_fake: 1.224
(epoch: 5, iters: 6800, time: 0.258) G_GAN: 6.098 G_GAN_Feat: 4.768 G_VGG: 10.794 F_Flow: 37.454 F_Warp: 1.577 F_Mask: 3.763 D_real: 0.002 D_fake: 0.000
(epoch: 5, iters: 7000, time: 0.262) G_GAN: 7.120 G_GAN_Feat: 5.566 G_VGG: 13.230 F_Flow: 44.769 F_Warp: 1.634 F_Mask: 3.720 D_real: 0.000 D_fake: 0.000
(epoch: 5, iters: 7200, time: 0.263) G_GAN: 6.472 G_GAN_Feat: 4.973 G_VGG: 11.629 F_Flow: 32.759 F_Warp: 1.369 F_Mask: 3.210 D_real: 0.003 D_fake: 0.001
(epoch: 5, iters: 7400, time: 0.252) G_GAN: 7.402 G_GAN_Feat: 5.436 G_VGG: 11.083 F_Flow: 40.644 F_Warp: 1.357 F_Mask: 3.310 D_real: 0.001 D_fake: 0.001
(epoch: 5, iters: 7600, time: 0.280) G_GAN: 6.293 G_GAN_Feat: 4.803 G_VGG: 10.134 F_Flow: 34.146 F_Warp: 1.168 F_Mask: 3.022
(epoch: 5, iters: 7800, time: 0.271) G_GAN: 6.919 G_GAN_Feat: 4.915 G_VGG: 9.678 F_Flow: 43.328 F_Warp: 1.198 F_Mask: 2.816 D_real: 0.000 D_fake: 0.043
(epoch: 5, iters: 8000, time: 0.269) G_GAN: 6.653 G_GAN_Feat: 4.640 G_VGG: 10.237 F_Flow: 35.914 F_Warp: 1.146 F_Mask: 2.887 D_real: 0.000 D_fake: 0.002
saving the latest model (epoch 5, total_steps 48000)
(epoch: 5, iters: 8200, time: 0.300) G_GAN: 8.048 G_GAN_Feat: 5.240 G_VGG: 9.980 F_Flow: 47.772 F_Warp: 1.427 F_Mask: 3.489 D_real: 0.002
(epoch: 5, iters: 8400, time: 0.309) G_GAN: 6.162 G_GAN_Feat: 5.195 G_VGG: 10.368 F_Flow: 40.305 F_Warp: 1.195 F_Mask: 2.866 D_real: 0.000 D_fake: 0.000
(epoch: 5, iters: 8600, time: 0.274) G_GAN: 7.631 G_GAN_Feat: 5.033 G_VGG: 10.163 F_Flow: 35.009 F_Warp: 0.932 F_Mask: 2.475 D_real: 0.000 D_fake: 0.000
(epoch: 5, iters: 8800, time: 0.262) G_GAN: 7.215 G_GAN_Feat: 5.538 G_VGG: 10.976 F_Flow: 42.931 F_Warp: 1.358 F_Mask: 3.307 D_real: 0.000
I meet the problem
Hi, I also encounter the same issue @tcwang0509 during training Face. Have you found the reason?
I have the same problem as this. @tcwang0509
I follow the training tutorial to compile a snapshot of FlowNet2 and download example datasets by running python3 scripts/download_datasets.py.
Finally running the script train_g1_256.sh.
But the loss D_real and D_fake was collapsed after around epoch 5.
Also the option of distributed is false in the file opt.
Here is my loss_log and images.
Hi, I also encounter the same issue @tcwang0509 during training Face. Have you found the reason?
I also encounter the same issue @tcwang0509
@shiluooulihs @gabewilliam @Ian990466 @keetsky @nihaomiao
Have you solved this problem? Could you share the solution? Thanks
@shiluooulihs @gabewilliam @Ian990466 @keetsky @nihaomiao
Have you solved this problem? Could you share the solution? Thanks
No i haven't, if you're interested in face reenactment, you can try https://github.com/AliaksandrSiarohin/first-order-model
i have the same problem with pose.
@tcwang0509 @keetsky @DDDlk @gabewilliam @shiluooulihs i have the same problem with street. Have you solved it?
I want to reproduce the result. But the loss D_real and D_fake always become None after a few epochs. The model seems collapsed
I follow the guide in readme.md step by step, using the default hyper params.
anyone knows why?