Hangz-nju-cuhk / Talking-Face_PC-AVS

Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)
Creative Commons Attribution 4.0 International
911 stars 173 forks source link

RuntimeError: mat1 dim 1 must match mat2 dim 0 #22

Open Serjura opened 3 years ago

Serjura commented 3 years ago

Hi Hangz_Zhou and team,

I've been struggling to get the demo experiment to work. When I run the code, I get the following Runtime error:

Network [ModulateGenerator] was created. Total number of parameters: 90.1 million. To see the architecture, do print(network).
Embedding size is 512, encoder SAP.
Network [ResSESyncEncoder] was created. Total number of parameters: 10.4 million. To see the architecture, do print(network).
Network [FanEncoder] was created. Total number of parameters: 14.3 million. To see the architecture, do print(network).
Network [ResNeXtEncoder] was created. Total number of parameters: 38.0 million. To see the architecture, do print(network).
Pretrained network G has fewer layers; The following are not initialized:
['conv1', 'convs', 'style', 'to_rgb1', 'to_rgbs']
model [AvModel] was created
working
dataset [VOXTestDataset] of size 361 was created
  0%|          | 0/181 [00:00<?, ?it/s]C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\venv\lib\site-packages\torch\nn\functional.py:3328: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\venv\lib\site-packages\torch\nn\functional.py:3458: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode)
  0%|          | 0/181 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "C:/Users/Admin/Documents/Github/Talking-Face_PC-AVS/app/inference.py", line 107, in main
    inference_single_audio(opt, path_label, model)
  File "C:/Users/Admin/Documents/Github/Talking-Face_PC-AVS/app/inference.py", line 66, in inference_single_audio
    fake_image_original_pose_a, fake_image_driven_pose_a = model.forward(data_i, mode='inference')
  File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\models\av_model.py", line 94, in forward
    driving_pose_frames)
  File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\models\av_model.py", line 484, in inference
    fake_image_ref_pose_a, _ = self.generate_fake(sel_id_feature, ref_merge_feature_a)
  File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\models\av_model.py", line 448, in generate_fake
    fake_image, style_rgb = self.netG(style)
  File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\venv\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\models\networks\generator.py", line 583, in forward
    out = self.conv1(out, latent[:, 0], noise=noise[0])
  File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\venv\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\models\networks\generator.py", line 392, in forward
    out, _ = self.conv(input, style)
  File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\venv\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\models\networks\generator.py", line 295, in forward
    style = self.modulation(style).view(batch, 1, in_channel, 1, 1)
  File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\venv\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\models\networks\generator.py", line 214, in forward
    input, self.weight * self.scale, bias=self.bias * self.lr_mul
  File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\venv\lib\site-packages\torch\nn\functional.py", line 1753, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: mat1 dim 1 must match mat2 dim 0
misc/Input/517600055 1 misc/Pose_Source/517600078 160 misc/Audio_Source/681600002.mp3 misc/Mouth_Source/681600002 363 dummy

mat1 dim 1 must match mat2 dim 0

Process finished with exit code 0

The error occurs with these variables, although I'm not sure this is telling you much: image

I'm currently running the code with PyTorch 1.8.1 (and Python 3.6) as I haven't managed to get PyTorch 1.3.0 working due to CUDA 10 not supporting my GPU. What would you recommend as a following action? Your help is very appreciated. Keep up the good work!

Hangz-nju-cuhk commented 3 years ago

Please follow the instructions step by step. Have you run the demo_vox.sh or use its settings? It seems you have not used the corrected arguments.