The loss D_real and D_fake always become None after a few epochs, when i training the model using the faceForensics dataset

shiluooulihs commented 4 years ago

I want to reproduce the result. But the loss D_real and D_fake always become None after a few epochs. The model seems collapsed

I follow the guide in readme.md step by step, using the default hyper params.

anyone knows why?

tcwang0509 commented 4 years ago

Did you use distributed training? There's some problem with it and it's currently under review.

aniruok9 commented 4 years ago

I have the same problem as this. Pre-processed the entire FaceForensics dataset into the same format as the examples given, and trained with the following script:

python train.py --name face_256 --dataset_mode fewshot_face --adaptive_spade --warp_ref --spade_combine --batchSize 4 --continue_train

And got the same problem

I'm using Pytorch 1.5.0 and turned distributed training off.

Any advice?

shiluooulihs commented 4 years ago

@tcwang0509 I only use one gpu(rtx 2080Ti) with batchsize 4, so i don't use distributed training.

I have tried pytorch1.0.0 and pytorch1.2.0（suggested version in readme）, the problem still exists.

Could you share the loss curve and intermediate result images at different stage of you trained model？

gabewilliam commented 4 years ago

I ran into the same issue using a custom dataset. I reduced the batch size to 2 and --loadSize and --fineSize to 128 instead of 256. It seems to be working for me now, albeit with a lower resolution. I'm not 100% sure which of the above changes solved the problem but it might be worth a try!

deucalionAlpha commented 4 years ago

hey can you share the trained model? I meet the problem :#64.

keetsky commented 4 years ago

(epoch: 5, iters: 5200, time: 0.215) G_GAN: 1.148 G_GAN_Feat: 1.575 G_VGG: 3.786 F_Flow: 47.920 F_Warp: 1.578 F_Mask: 3.744 D_real: 1.216 D_fake: 0.456 

(epoch: 5, iters: 5400, time: 0.213) G_GAN: 1.206 G_GAN_Feat: 1.475 G_VGG: 3.834 F_Flow: 42.423 F_Warp: 1.231 F_Mask: 3.050 D_real: 1.006 D_fake: 0.775 
(epoch: 5, iters: 5600, time: 0.214) G_GAN: 1.607 G_GAN_Feat: 1.443 G_VGG: 3.900 F_Flow: 38.237 F_Warp: 1.260 F_Mask: 3.218 D_real: 1.071 D_fake: 0.600 
(epoch: 5, iters: 5800, time: 0.253) G_GAN: 7.229 G_GAN_Feat: 5.726 G_VGG: 9.282 F_Flow: 61.446 F_Warp: 1.748 F_Mask: 3.956 D_real: 0.149 D_fake: 0.329 
(epoch: 5, iters: 6000, time: 0.272) G_GAN: 4.416 G_GAN_Feat: 4.335 G_VGG: 10.591 F_Flow: 54.107 F_Warp: 1.712 F_Mask: 3.758 D_real: 0.028 D_fake: 0.001 
saving the latest model (epoch 5, total_steps 46000)
(epoch: 5, iters: 6200, time: 0.250) G_GAN: 6.598 G_GAN_Feat: 6.497 G_VGG: 12.958 F_Flow: 54.509 F_Warp: 1.640 F_Mask: 3.585 D_real: 0.045 
(epoch: 5, iters: 6400, time: 0.274) G_GAN: 7.168 G_GAN_Feat: 5.117 G_VGG: 9.567 F_Flow: 41.135 F_Warp: 1.234 F_Mask: 3.038 D_real: 0.001 D_fake: 0.000 
(epoch: 5, iters: 6600, time: 0.249) G_GAN: 8.530 G_GAN_Feat: 5.387 G_VGG: 12.317 F_Flow: 42.177 F_Warp: 1.501 F_Mask: 3.491 D_real: 0.001 D_fake: 1.224 
(epoch: 5, iters: 6800, time: 0.258) G_GAN: 6.098 G_GAN_Feat: 4.768 G_VGG: 10.794 F_Flow: 37.454 F_Warp: 1.577 F_Mask: 3.763 D_real: 0.002 D_fake: 0.000 
(epoch: 5, iters: 7000, time: 0.262) G_GAN: 7.120 G_GAN_Feat: 5.566 G_VGG: 13.230 F_Flow: 44.769 F_Warp: 1.634 F_Mask: 3.720 D_real: 0.000 D_fake: 0.000 
(epoch: 5, iters: 7200, time: 0.263) G_GAN: 6.472 G_GAN_Feat: 4.973 G_VGG: 11.629 F_Flow: 32.759 F_Warp: 1.369 F_Mask: 3.210 D_real: 0.003 D_fake: 0.001 
(epoch: 5, iters: 7400, time: 0.252) G_GAN: 7.402 G_GAN_Feat: 5.436 G_VGG: 11.083 F_Flow: 40.644 F_Warp: 1.357 F_Mask: 3.310 D_real: 0.001 D_fake: 0.001 
(epoch: 5, iters: 7600, time: 0.280) G_GAN: 6.293 G_GAN_Feat: 4.803 G_VGG: 10.134 F_Flow: 34.146 F_Warp: 1.168 F_Mask: 3.022 
(epoch: 5, iters: 7800, time: 0.271) G_GAN: 6.919 G_GAN_Feat: 4.915 G_VGG: 9.678 F_Flow: 43.328 F_Warp: 1.198 F_Mask: 2.816 D_real: 0.000 D_fake: 0.043 
(epoch: 5, iters: 8000, time: 0.269) G_GAN: 6.653 G_GAN_Feat: 4.640 G_VGG: 10.237 F_Flow: 35.914 F_Warp: 1.146 F_Mask: 2.887 D_real: 0.000 D_fake: 0.002 
saving the latest model (epoch 5, total_steps 48000)
(epoch: 5, iters: 8200, time: 0.300) G_GAN: 8.048 G_GAN_Feat: 5.240 G_VGG: 9.980 F_Flow: 47.772 F_Warp: 1.427 F_Mask: 3.489 D_real: 0.002 
(epoch: 5, iters: 8400, time: 0.309) G_GAN: 6.162 G_GAN_Feat: 5.195 G_VGG: 10.368 F_Flow: 40.305 F_Warp: 1.195 F_Mask: 2.866 D_real: 0.000 D_fake: 0.000 
(epoch: 5, iters: 8600, time: 0.274) G_GAN: 7.631 G_GAN_Feat: 5.033 G_VGG: 10.163 F_Flow: 35.009 F_Warp: 0.932 F_Mask: 2.475 D_real: 0.000 D_fake: 0.000 
(epoch: 5, iters: 8800, time: 0.262) G_GAN: 7.215 G_GAN_Feat: 5.538 G_VGG: 10.976 F_Flow: 42.931 F_Warp: 1.358 F_Mask: 3.307 D_real: 0.000

I meet the problem

nihaomiao commented 3 years ago

Hi, I also encounter the same issue @tcwang0509 during training Face. Have you found the reason?

Ian990466 commented 3 years ago

I have the same problem as this. @tcwang0509 I follow the training tutorial to compile a snapshot of FlowNet2 and download example datasets by running python3 scripts/download_datasets.py. Finally running the script train_g1_256.sh. But the loss D_real and D_fake was collapsed after around epoch 5. Also the option of distributed is false in the file opt.

Here is my loss_log and images.

DDDlk commented 3 years ago

Hi, I also encounter the same issue @tcwang0509 during training Face. Have you found the reason?

I also encounter the same issue @tcwang0509

DDDlk commented 3 years ago

@shiluooulihs @gabewilliam @Ian990466 @keetsky @nihaomiao

Have you solved this problem? Could you share the solution? Thanks

shiluooulihs commented 3 years ago

@shiluooulihs @gabewilliam @Ian990466 @keetsky @nihaomiao

Have you solved this problem? Could you share the solution? Thanks

No i haven't， if you're interested in face reenactment, you can try https://github.com/AliaksandrSiarohin/first-order-model

sarrbranka commented 3 years ago

i have the same problem with pose.

songyn95 commented 2 years ago

@tcwang0509 @keetsky @DDDlk @gabewilliam @shiluooulihs i have the same problem with street. Have you solved it?

NVlabs / few-shot-vid2vid

The loss D_real and D_fake always become None after a few epochs, when i training the model using the faceForensics dataset #60