[Face] Results getting worse after 50 epochs, model starts diverging

NVlabs / few-shot-vid2vid

Pytorch implementation for few-shot photorealistic video-to-video translation.

Other

1.8k stars 274 forks source link

[Face] Results getting worse after 50 epochs, model starts diverging #26

Closed hamzatrq closed 3 years ago

hamzatrq commented 4 years ago

Logs: loss_log.txt

opt.txt opt.txt

Upto 50 epochs results were satisfactory but then the results started to get worse.

After 50 Epochs: 00000 00001

After 55 Epochs: 00000 00001

After 60 Epochs: 00000 00001

After 65 Epochs: 00000 00001

After 75 Epoch: 00001 00000

hamzatrq commented 4 years ago

Dataset Preprocessing: I used Face Forensics dataset, downloaded the videos, renamed them, converted them to frames, used dlib to generate keypoints.

nfrik commented 4 years ago

How many images are there in your dataset? Did you have better results with the dataset provided by NVIDIA? In my case quality also degrades after epoch 50 and test images don't look that good either.

Epoch 50: iTerm2 EvoRtj epoch050_iter0489800_synthesized_image

Epoch 93: iTerm2 LtrNzk epoch093_iter0924500_synthesized_image

hamzatrq commented 4 years ago

@nfrik nvidia just provided example images, I do not know exactly the number of frames but I used 754 videos for training as mentioned in the paper.

BaldrLector commented 4 years ago

@hamzatrq Hi，from your loss log, G_GAN is increasing and DT is reducing since 50 epoch, it seems the model is collapsed.

hamzatrq commented 4 years ago

@BaldrLector do you think retraining the model will solve the problem? Also if yes. From 50 epochs or from start?

BaldrLector commented 4 years ago

@hamzatrq Hi, If epoch 50 works well then there no necessary to retrain the model,

In fact, I am also facing the same problem as you, the second training stage is so hard ( it is maybe the longer the sequence, the more varying rotation, translation, and zoom).

Some of my training results could be seen at #18, the second stage training indeed harm the performance. I guess there may be some trick to avoid model collapse,

tcwang0509 commented 4 years ago

I will look into the issue once I get back. In the meantime, if epoch 50 works well, you can run inference using the model by adding --n_frames_G 1 to force using single frame.

lukemelas commented 4 years ago

Authors, thank you for your wonderful paper and your responsiveness in replying to issues.

I just wanted to note that I am also experiencing this issue -- face training diverges after epoch 50. I am using 1000 videos from the FaceForensics++ dataset and training on 8 GPUs with the provided hyperparameters.

MengXinChengXuYuan commented 4 years ago

I am also facing the same problem After 50 epochs there shows black holes on the syn image in my experiment

ndyashas commented 4 years ago

I have noticed that when I stop while in temporal training stage and then continue again, the time taken per iter reduces to what it was when I was in single frame training stage. Also the GPU's memory usage drops to what it was when in single frame training stage. I see similar trend in your log file @hamzatrq @tcwang0509

akhilsantha7 commented 4 years ago

Hello, does any one solved the issue of diverging after 50 epochs..? Please do let me know. I am also facing the same issue.

Thanks, Akhil.

hamzatrq commented 4 years ago

Hi Akhil, no still waiting for the author to reply.

On Tue, Feb 4, 2020 at 2:24 AM Akhil notifications@github.com wrote:

Hello, does any one solved the issue of diverging after 50 epochs..? Please do let me know. I am also facing the same issue.

Thanks, Akhil.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NVlabs/few-shot-vid2vid/issues/26?email_source=notifications&email_token=ADKWVVFOXI533WTUTMLPOHTRBCDPRA5CNFSM4KD7SKFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKVOJPI#issuecomment-581625021, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADKWVVHXD7YPZOKXV6PDHU3RBCDPRANCNFSM4KD7SKFA .

BaldrLector commented 4 years ago

I am also waiting for the author to reply.

hamzatrq commented 4 years ago

@tcwang0509 can you please shed some light on the issue? Thank you!

BaldrLector commented 4 years ago

I find a very related face reenactment work: see https://aliaksandrsiarohin.github.io/first-order-model-website/ same from NIPS, results are good, did anyone try it in the face dataset?

tcwang0509 commented 4 years ago

There was a potential bug during refactoring that's fixed in the latest commit. Please try and see if it works now.

tcwang0509 commented 3 years ago

This repo is now deprecated. Please refer to the new Imaginaire repo: https://github.com/NVlabs/imaginaire.