Closed bobby-chiu closed 1 year ago
The current code has a bug that limits the input audio to longer than 0.8s. This could be resolved through adding these codes in audio2pose.py. We will update the new code in the next commit.
The current code has a bug that limits the input audio to longer than 0.8s. This could be resolved through adding these codes in audio2pose.py. We will update the new code in the next commit.
@Winfredy It fixed me. thanks. But new error happened if I fed with shorter audio. Anything I missed? speech.zip
python inference.py --driven_audio ./speech.wav --source_image face.png --batch_size 8 --result_dir ./examples/results
checkpoints\epoch_20.pth
checkpoints\auido2pose_00140-model.pth
checkpoints\auido2exp_00300-model.pth
checkpoints\facevid2vid_00189-model.pth.tar
checkpoints\mapping_00229-model.pth.tar
landmark Det:: 100%|█████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00, 4.64s/it]
3DMM Extraction In Video:: 100%|████████████████████████████████████████████████████████| 1/1 [00:03<00:00, 3.42s/it]
Traceback (most recent call last):
File "inference.py", line 99, in
This bug is resolved now, you can update the code to test some short audios.
python inference.py --driven_audio ./speech.wav --source_image face.png --batch_size 6 --result_dir ./examples/results checkpoints\epoch_20.pth checkpoints\auido2pose_00140-model.pth checkpoints\auido2exp_00300-model.pth checkpoints\facevid2vid_00189-model.pth.tar checkpoints\mapping_00229-model.pth.tar landmark Det:: 100%|█████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00, 4.67s/it] 3DMM Extraction In Video:: 100%|████████████████████████████████████████████████████████| 1/1 [00:04<00:00, 4.86s/it] Traceback (most recent call last): File "inference.py", line 99, in
main(args)
File "inference.py", line 71, in main
coeff_path = audio_to_coeff.generate(batch, save_dir, pose_style)
File "D:\Workspace\projects\sadtalker\SadTalker\test_audio2coeff.py", line 75, in generate
results_dict_pose = self.audio2pose_model.test(batch)
File "D:\Workspace\projects\sadtalker\SadTalker\audio2pose_models\audio2pose.py", line 86, in test
batch = self.netG.test(batch)
File "D:\Workspace\projects\sadtalker\SadTalker\audio2pose_models\cvae.py", line 49, in test
return self.decoder(batch)
File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "D:\Workspace\projects\sadtalker\SadTalker\audio2pose_models\cvae.py", line 139, in forward
x_out = self.MLP(x_in) # bs layer_sizes[-1]
File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\container.py", line 139, in forward
input = module(input)
File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x196 and 262x128)
Below is my audio file: speech.zip