amjltc295 / Free-Form-Video-Inpainting

Official Pytorch implementation of "Learnable Gated Temporal Shift Module for Deep Video Inpainting. Chang et al. BMVC 2019." and the FVI dataset in "Free-form Video Inpainting with 3D Gated Convolution and Temporal PatchGAN, Chang et al. ICCV 2019"
https://arxiv.org/abs/1907.01131
335 stars 52 forks source link

Pretraining/Finetuning stage of FFVI and Cannot reproduce quantitative evaluation #30

Closed MaureenZOU closed 4 years ago

MaureenZOU commented 4 years ago

Hi Ya-Liang, after going through the issues, you have mentioned in https://github.com/amjltc295/Free-Form-Video-Inpainting/issues/20 that there are pretraining and finetuning stage of your model. Also, when I train your model from scratch, I found your GAN loss is set to zero. Could you please explain the detailed training schedule of the pretraining and finetuning stage and loss type/weight? Thanks in advance : )

MaureenZOU commented 4 years ago

Question 2: Cannot reproduce quantitative evaluation:

Paper Result: Screen Shot 2020-07-14 at 1 51 28 PM

My Result: Screen Shot 2020-07-14 at 2 17 34 PM

Evaluation Setting: Use your default setting to inference all the test images, and then run evaluation using the following cmd: python evaluate.py -rgd ../../../data/FVI/Test/JPEGImages/ -rmd ../../../data/FVI/Test/object_masks/ -rrd ../../../data/results/FVI_Test/epoch_0/test_object_removal/ all the setting is in default.

I compare the MSE score on the object like mask, they show very different result 0.0024 vs 0.01044. Meanwhile, FID score is also very different.

MaureenZOU commented 4 years ago

First question solved, found in SUP. Thanks!

amjltc295 commented 4 years ago

This table is an average score of different mask-to-frame ratios (please refer to the supplementary materials). I'm not sure which ratio you're using, but it should be a ~60% one considering the score.

MaureenZOU commented 4 years ago

Thanks a lot for your information! I am really sorry to have an additional question : ( After training your model around 300 epochs with the default setting. The results are actually very fluctuated, could you please verify whether it makes sense in the attached file. Each row represent an epoch, and it is ordered in chronological order. https://drive.google.com/file/d/1sOelWjXReCvLyaiKWDScsPa2xdEWQJA7/view?usp=sharing

amjltc295 commented 4 years ago

Did you include the discriminator loss? This is normal if you only use perceptual loss. The pretraining step is done if the loss is converged. Then you need to include the discriminator. If it's still like this after fine-tuning, you will probably need to tune some hyper parameters.

MaureenZOU commented 4 years ago

Thanks a lot! I only use perceptual loss. I will finetune with discriminator loss to see what happens. Thanks!

MaureenZOU commented 4 years ago

Thanks for your code again! I have reproduce all the results on your paper using the model provided. At the same time, reproduce the paper number by training the model myself. It is a nice repo with clean code : )

amjltc295 commented 4 years ago

No problem, thank you for verifying it as well :)