Rudrabha / Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
https://synclabs.so
10.18k stars 2.19k forks source link

Inference.py dies after reading video & extracting audio #290

Closed ilonamm closed 3 years ago

ilonamm commented 3 years ago

I'm having a strange issue where interference.py gets killed after reading video and extracting audio. Using Python 3.6, tested on two devices (unix on windows & Mac OSX Big Sur) -- same issue, previously worked fine. With both wav2lip_gan.pth and lipsync_expert.pth.


Wav2Lip % python inference.py --checkpoint_path checkpoints/lipsync_expert.pth --face files/intro-katherine2.mp4 --audio files/test_turkish.m4a --outfile results/intro_katherine_turkish.mp4 

Using cpu for inference.
Reading video frames...
Number of frames available for inference: 1094
Extracting raw audio...
ffmpeg version 4.0.2 Copyright (c) 2000-2018 the FFmpeg developers
  built with Apple LLVM version 9.1.0 (clang-902.0.39.2)
  configuration: 
  libavutil      56. 14.100 / 56. 14.100
  libavcodec     58. 18.100 / 58. 18.100
  libavformat    58. 12.100 / 58. 12.100
  libavdevice    58.  3.100 / 58.  3.100
  libavfilter     7. 16.100 /  7. 16.100
  libswscale      5.  1.100 /  5.  1.100
  libswresample   3.  1.100 /  3.  1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'files/test_turkish.m4a':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A isommp42
    creation_time   : 2021-06-23T07:19:30.000000Z
    iTunSMPB        :  00000000 00000493 0000007B 0000000000177EF2 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  Duration: 00:00:34.92, start: 0.026553, bitrate: 144 kb/s
    Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 128 kb/s (default)
    Metadata:
      creation_time   : 2021-06-23T07:19:30.000000Z
      handler_name    : Core Media Audio
Stream mapping:
  Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'temp/temp.wav':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A isommp42
    iTunSMPB        :  00000000 00000493 0000007B 0000000000177EF2 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    ISFT            : Lavf58.12.100
    Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s (default)
    Metadata:
      creation_time   : 2021-06-23T07:19:30.000000Z
      handler_name    : Core Media Audio
      encoder         : Lavc58.18.100 pcm_s16le
size=    3008kB time=00:00:34.91 bitrate= 705.6kbits/s speed=1.04e+03x    
video:0kB audio:3008kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.002533%
(80, 2794)
Length of mel chunks: 1043
  0%|                                                     | 0/9 [00:00<?, ?it/s]
  0%|                                                    | 0/66 [00:00<?, ?it/s]
zsh: killed     python inference.py --checkpoint_path checkpoints/lipsync_expert.pth --face  ```
GzuPark commented 3 years ago

I'm not sure, but I think face recognition part is a problem. Try to check logs for debugging.

ilonamm commented 3 years ago

Face not detected should raise an error, and I'm not seeing any error message. @GzuPark what logs do you mean? I checked nothing gets logged in the system log, and I don't see the app producing any logs.

GzuPark commented 3 years ago
  1. This repo use the face-alignment package and need to find any face. I think you try to infer private videos that cannot detect faces by the detector.
  2. "Checking logs" means that you create logging file or just print a statement by yourself, not provided.
ilonamm commented 3 years ago

Discovered the problem: the video file had a different codec than the previous source videos I had used. Encoding the file with H.264 solved the issue.