OpenTalker / video-retalking

[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
https://opentalker.github.io/video-retalking/
Apache License 2.0
6.44k stars 953 forks source link

result bad pixels on mouth #158

Open potatoker opened 11 months ago

potatoker commented 11 months ago
截屏2023-11-03 16 11 02

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/data/workspace/tts/video-retalking/env/lib/python3.8/site-packages/torch/utils/cpp_extension.py:284: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
[Info] Using cuda for inference.
[Step 0] Number of frames available for inference: 520
[Step 1] Using saved landmarks.
[Step 2] 3DMM Extraction In Video:: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 520/520 [00:14<00:00, 36.65it/s]
using expression center
Load checkpoint from: checkpoints/DNet.pt
Load checkpoint from: checkpoints/LNet.pth
Load checkpoint from: checkpoints/ENet.pth
[Step 3] Using saved stabilized video.
[Step 4] Load audio; Length of mel chunks: 446
[Step 5] Reference Enhancement: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 446/446 [01:22<00:00,  5.43it/s]
[Step 6] Lip Synthesis::   0%|     

this is the main log, no specific error message.

potatoker commented 11 months ago
                                                                                                                                                                             No face detected in this image███████▉                                                                                                        | 73/520 [00:17<00:13, 32.01it/s]
No face detected in this image
No face detected in this image
No face detected in this image
No face detected in this image
No face detected in this image

the log keep saying no Face detected in this image, but in my input video, there is always a face

qianchen94 commented 11 months ago

similar issue QQ截图20231103163001

potatoker commented 11 months ago

I tried the same video on google collab , the result is ok, and there is no "No face detected in this image" log, I think on my own machine, some frames' detection failure causing this issue, no idea how to fix it.

yigitersoy87 commented 11 months ago

https://github.com/OpenTalker/video-retalking/assets/11890900/15556d9b-3f1d-4afe-be9b-32bc4b9c1855

Hi, i have the similar issue, i am using Google Colab, the result is not satisfying, is it because of footage quality or i did something wrong?

yigitersoy87 commented 11 months ago

https://github.com/OpenTalker/video-retalking/assets/11890900/ad7a8473-400d-4894-87bd-9821b5a3a163

For example here, whats happening to face? Is there any way to avoid dissolve on face?

potatoker commented 11 months ago

download.17.mp4 Hi, i have the similar issue, i am using Google Colab, the result is not satisfying, is it because of footage quality or i did something wrong?

in this video, I think it just because video-retalking which inpaint the mouth area needs the mouth pixel not to be shield by any other object like the mic. you should use a mouth-clean video.

potatoker commented 11 months ago

download.21.mp4 For example here, whats happening to face? Is there any way to avoid dissolve on face?

this video is mouth-clean, but when the mouth move so fast, the inpainted area will blink, I think it is a Known issue of all video-inpainting technology. you better try a video that the head position not changes so fast.

potatoker commented 11 months ago

I tried the same video on google collab , the result is ok, and there is no "No face detected in this image" log, I think on my own machine, some frames' detection failure causing this issue, no idea how to fix it.

got some insight on this issue, I found that FaceEnhancement process(in gpen_face_enhancer.py) first called with some error(without log), before FaceEnhancement.process method be called the input frame looks ok: like this: enhancer_before

but after the enhancer, the output image collapsed:

enhancer_after

tried 3 different input video, same issue on my centos machine, still no clue

pineking commented 11 months ago

I tried the same video on google collab , the result is ok, and there is no "No face detected in this image" log, I think on my own machine, some frames' detection failure causing this issue, no idea how to fix it.

got some insight on this issue, I found that FaceEnhancement process(in gpen_face_enhancer.py) first called with some error(without log), before FaceEnhancement.process method be called the input frame looks ok: like this: enhancer_before

but after the enhancer, the output image collapsed:

enhancer_after

tried 3 different input video, same issue on my centos machine, still no clue

could you get better result by skipping the face enhancement process

potatoker commented 11 months ago

I tried the same video on google collab , the result is ok, and there is no "No face detected in this image" log, I think on my own machine, some frames' detection failure causing this issue, no idea how to fix it.

got some insight on this issue, I found that FaceEnhancement process(in gpen_face_enhancer.py) first called with some error(without log), before FaceEnhancement.process method be called the input frame looks ok: like this: enhancer_before but after the enhancer, the output image collapsed: enhancer_after tried 3 different input video, same issue on my centos machine, still no clue

could you get better result by skipping the face enhancement process

截屏2023-11-07 19 52 02

I drop the result of enhancer in the pre process before datagen in inference.py, though the final synthesized video result looks ok by my bare eyes. But still want to know why....