BadToBest / EchoMimic

Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
https://badtobest.github.io/echomimic.html
Apache License 2.0
2.38k stars 280 forks source link

对于图片的适配好像有问题 #111

Open coderlihong opened 1 month ago

coderlihong commented 1 month ago

如果是大头照,找脸部位置就不会有问题,如果是全身照,或者人头位置比较小,就会报维度不匹配错误 ` File "infer_audio2vid.py", line 258, in main()

File "infer_audio2vid.py", line 226, in main video = pipe( File "/home/miniconda3/envs/echomimic/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs)

File "/home/project/EchoMimic-main/src/pipelines/pipeline_echo_mimic.py", line 507, in call pred = self.denoising_unet(

File "/home/miniconda3/envs/echomimic/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs)

File "/home/miniconda3/envs/echomimic/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs)

File "/home/project/EchoMimic-main/src/models/unet_3d_echo.py", line 494, in forward sample = sample + face_musk_fea RuntimeError: The size of tensor a (64) must match the size of tensor b (56) at non-singleton dimension 4`

lingfengchencn commented 1 month ago

+1 , 大头照也不行。

endman100 commented 1 month ago

測試了一下,MTCNN找不到臉就無法執行了

lymhust commented 5 days ago

https://www.modelscope.cn/studios/BadToBest/BadToBest 用上面这个算法测试一下试试,有问题的图像,语音可以放到留言里,方便复现定位。