Meta-Portrait / MetaPortrait

[CVPR 2023] MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation
https://meta-portrait.github.io/
MIT License
536 stars 41 forks source link

About the setting of sr model #15

Open KangweiiLiu opened 1 year ago

KangweiiLiu commented 1 year ago

Great work! I encountered some errors when trying to reproduce your sr_model. The errors are as follows: 

Traceback (most recent call last): File "Experimental_root/train.py", line 16, in train_pipeline(root_path) File "/home/lkw/anaconda3/envs/dvp/lib/python3.7/site-packages/basicsr/train.py", line 169, in train_pipeline model.optimize_parameters(current_iter) File "/home/lkw/lkw_docs/source_code1/MetaPortrait-main/sr_model/Experimental_root/models/metaportrait_video_model.py", line 100, in optimize_parameters super(MetaportraitVideoRecurrentModel, self).optimize_parameters(current_iter) File "/home/lkw/anaconda3/envs/dvp/lib/python3.7/site-packages/basicsr/models/sr_model.py", line 105, in optimize_parameters l_percep, l_style = self.cri_perceptual(self.output, self.gt) File "/home/lkw/anaconda3/envs/dvp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, kwargs) File "/home/lkw/anaconda3/envs/dvp/lib/python3.7/site-packages/basicsr/losses/basic_loss.py", line 209, in forward x_features = self.vgg(x) File "/home/lkw/anaconda3/envs/dvp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, *kwargs) File "/home/lkw/anaconda3/envs/dvp/lib/python3.7/site-packages/basicsr/archs/vgg_arch.py", line 157, in forward x = layer(x) File "/home/lkw/anaconda3/envs/dvp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "/home/lkw/anaconda3/envs/dvp/lib/python3.7/site-packages/to

Also, for the dataset part of sr_model, after reading the paper, I still have some questions:

  1. Is the training set composed of 400 videos, each with a duration of 12 seconds and a frame rate of 25FPS (with a total of 300 frames)?
  2. Is the lq (low-quality) data in the training set generated by the model or obtained by downsampling the original video?
  3. What was the training device used and how long did the training take? At what point did the loss reach a desirable level?
KangweiiLiu commented 1 year ago

Missing dependence: ImportError: cannot import name 'folder_to_concat_folder' from 'basicsr.utils' (/home/lkw/anaconda3/envs/dvp/lib/python3.7/site-packages/basicsr/utils/init.py)

ChenyangQiQi commented 1 year ago

@KangweiiLiu Thanks for your interest. I provide answers of some questions as below.

  1. the lq (low-quality ) are 256X256 video frames generated by the base model composed of optical flow estimation, warping, and identity enhancement.
  2. We use 8 X V100-32G for training, which takes several hours to converge from a pre-trained GFPGAN checkpoint. The stopping point of finetuning id decided by the convergence of LPIPS loss and subjective perceptual quality of users. (Actually, the model converges very quickly and the temporal deflickering is heavily influence by the number of temporal frames in each sequence sample.)

I will check the error and training data details latter.

Charissil commented 1 year ago

Missing dependence: ImportError: cannot import name 'folder_to_concat_folder' from 'basicsr.utils' (/home/lkw/anaconda3/envs/dvp/lib/python3.7/site-packages/basicsr/utils/init.py) @KangweiiLiu Did you fix it? I also have this problem!