OpenTalker / SadTalker

[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
https://sadtalker.github.io/
Other
11.95k stars 2.22k forks source link

pytorch throws mat mismatch when using short audio as input #42

Closed bobby-chiu closed 1 year ago

bobby-chiu commented 1 year ago

python inference.py --driven_audio ./speech.wav --source_image face.png --batch_size 6 --result_dir ./examples/results checkpoints\epoch_20.pth checkpoints\auido2pose_00140-model.pth checkpoints\auido2exp_00300-model.pth checkpoints\facevid2vid_00189-model.pth.tar checkpoints\mapping_00229-model.pth.tar landmark Det:: 100%|█████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00, 4.67s/it] 3DMM Extraction In Video:: 100%|████████████████████████████████████████████████████████| 1/1 [00:04<00:00, 4.86s/it] Traceback (most recent call last): File "inference.py", line 99, in main(args) File "inference.py", line 71, in main coeff_path = audio_to_coeff.generate(batch, save_dir, pose_style) File "D:\Workspace\projects\sadtalker\SadTalker\test_audio2coeff.py", line 75, in generate results_dict_pose = self.audio2pose_model.test(batch) File "D:\Workspace\projects\sadtalker\SadTalker\audio2pose_models\audio2pose.py", line 86, in test batch = self.netG.test(batch) File "D:\Workspace\projects\sadtalker\SadTalker\audio2pose_models\cvae.py", line 49, in test return self.decoder(batch) File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "D:\Workspace\projects\sadtalker\SadTalker\audio2pose_models\cvae.py", line 139, in forward x_out = self.MLP(x_in) # bs layer_sizes[-1] File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\container.py", line 139, in forward input = module(input) File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x196 and 262x128)

Below is my audio file: speech.zip

Winfredy commented 1 year ago

The current code has a bug that limits the input audio to longer than 0.8s. This could be resolved through adding these codes in audio2pose.py. We will update the new code in the next commit. code

bobby-chiu commented 1 year ago

The current code has a bug that limits the input audio to longer than 0.8s. This could be resolved through adding these codes in audio2pose.py. We will update the new code in the next commit. code

@Winfredy It fixed me. thanks. But new error happened if I fed with shorter audio. Anything I missed? speech.zip

python inference.py --driven_audio ./speech.wav --source_image face.png --batch_size 8 --result_dir ./examples/results checkpoints\epoch_20.pth checkpoints\auido2pose_00140-model.pth checkpoints\auido2exp_00300-model.pth checkpoints\facevid2vid_00189-model.pth.tar checkpoints\mapping_00229-model.pth.tar landmark Det:: 100%|█████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00, 4.64s/it] 3DMM Extraction In Video:: 100%|████████████████████████████████████████████████████████| 1/1 [00:03<00:00, 3.42s/it] Traceback (most recent call last): File "inference.py", line 99, in main(args) File "inference.py", line 71, in main coeff_path = audio_to_coeff.generate(batch, save_dir, pose_style) File "D:\Workspace\projects\sadtalker\SadTalker\test_audio2coeff.py", line 78, in generate pose_pred = torch.Tensor(savgol_filter(np.array(pose_pred.cpu()), 13, 2, axis=1)).to(self.device) File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\scipy\signal_savitzky_golay.py", line 346, in savgol_filter _fit_edges_polyfit(x, window_length, polyorder, deriv, delta, axis, y) File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\scipy\signal_savitzky_golay.py", line 218, in _fit_edges_polyfit _fit_edge(x, 0, window_length, 0, halflen, axis, File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\scipy\signal_savitzky_golay.py", line 188, in _fit_edge poly_coeffs = np.polyfit(np.arange(0, window_stop - window_start), File "<__array_function__ internals>", line 5, in polyfit File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\numpy\lib\polynomial.py", line 634, in polyfit raise TypeError("expected x and y to have same length") TypeError: expected x and y to have same length

Winfredy commented 1 year ago

This bug is resolved now, you can update the code to test some short audios.