Inference.py dies after reading video & extracting audio

ilonamm commented 3 years ago

I'm having a strange issue where interference.py gets killed after reading video and extracting audio. Using Python 3.6, tested on two devices (unix on windows & Mac OSX Big Sur) -- same issue, previously worked fine. With both wav2lip_gan.pth and lipsync_expert.pth.


Wav2Lip % python inference.py --checkpoint_path checkpoints/lipsync_expert.pth --face files/intro-katherine2.mp4 --audio files/test_turkish.m4a --outfile results/intro_katherine_turkish.mp4 

Using cpu for inference.
Reading video frames...
Number of frames available for inference: 1094
Extracting raw audio...
ffmpeg version 4.0.2 Copyright (c) 2000-2018 the FFmpeg developers
  built with Apple LLVM version 9.1.0 (clang-902.0.39.2)
  configuration: 
  libavutil      56. 14.100 / 56. 14.100
  libavcodec     58. 18.100 / 58. 18.100
  libavformat    58. 12.100 / 58. 12.100
  libavdevice    58.  3.100 / 58.  3.100
  libavfilter     7. 16.100 /  7. 16.100
  libswscale      5.  1.100 /  5.  1.100
  libswresample   3.  1.100 /  3.  1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'files/test_turkish.m4a':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A isommp42
    creation_time   : 2021-06-23T07:19:30.000000Z
    iTunSMPB        :  00000000 00000493 0000007B 0000000000177EF2 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  Duration: 00:00:34.92, start: 0.026553, bitrate: 144 kb/s
    Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 128 kb/s (default)
    Metadata:
      creation_time   : 2021-06-23T07:19:30.000000Z
      handler_name    : Core Media Audio
Stream mapping:
  Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'temp/temp.wav':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A isommp42
    iTunSMPB        :  00000000 00000493 0000007B 0000000000177EF2 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    ISFT            : Lavf58.12.100
    Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s (default)
    Metadata:
      creation_time   : 2021-06-23T07:19:30.000000Z
      handler_name    : Core Media Audio
      encoder         : Lavc58.18.100 pcm_s16le
size=    3008kB time=00:00:34.91 bitrate= 705.6kbits/s speed=1.04e+03x    
video:0kB audio:3008kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.002533%
(80, 2794)
Length of mel chunks: 1043
  0%|                                                     | 0/9 [00:00<?, ?it/s]
  0%|                                                    | 0/66 [00:00<?, ?it/s]
zsh: killed     python inference.py --checkpoint_path checkpoints/lipsync_expert.pth --face  ```

GzuPark commented 3 years ago

I'm not sure, but I think face recognition part is a problem. Try to check logs for debugging.

ilonamm commented 3 years ago

Face not detected should raise an error, and I'm not seeing any error message. @GzuPark what logs do you mean? I checked nothing gets logged in the system log, and I don't see the app producing any logs.

GzuPark commented 3 years ago

This repo use the face-alignment package and need to find any face. I think you try to infer private videos that cannot detect faces by the detector.
"Checking logs" means that you create logging file or just print a statement by yourself, not provided.

ilonamm commented 3 years ago

Discovered the problem: the video file had a different codec than the previous source videos I had used. Encoding the file with H.264 solved the issue.

Rudrabha / Wav2Lip

Inference.py dies after reading video & extracting audio #290