Closed Anwar-Faiz closed 3 years ago
Similar error even if i used wav2lip.pth which is for the highly accurate version. command: python3 inference.py --checkpoint_path "face_detection/detection/sfd/wav2lip.pth" --face "anwar-test.mp4" --audio "celine-song.mp3"
Error:
Using cpu for inference.
Reading video frames...
Number of frames available for inference: 260
Extracting raw audio...
ffmpeg version 4.4 Copyright (c) 2000-2021 the FFmpeg developers
built with Apple clang version 12.0.0 (clang-1200.0.32.29)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.4_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox
libavutil 56. 70.100 / 56. 70.100
libavcodec 58.134.100 / 58.134.100
libavformat 58. 76.100 / 58. 76.100
libavdevice 58. 13.100 / 58. 13.100
libavfilter 7.110.100 / 7.110.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 9.100 / 5. 9.100
libswresample 3. 9.100 / 3. 9.100
libpostproc 55. 9.100 / 55. 9.100
[mp3 @ 0x7fbf5780be00] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from 'celine-song.mp3':
Metadata:
album : Celine Dion - ALL BY MYSELF+
title : Celine Dion - ALL BY MYSELF+
encoder : Lavf58.26.101
date : 2011
Duration: 00:05:05.90, start: 0.000000, bitrate: 320 kb/s
Stream #0:0: Audio: mp3, 48000 Hz, stereo, fltp, 320 kb/s
Stream #0:1: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 640x360 [SAR 96:96 DAR 16:9], 90k tbr, 90k tbn, 90k tbc (attached pic)
Metadata:
comment : Other
Stream mapping:
Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'temp/temp.wav':
Metadata:
IPRD : Celine Dion - ALL BY MYSELF+
INAM : Celine Dion - ALL BY MYSELF+
ICRD : 2011
ISFT : Lavf58.76.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s
Metadata:
encoder : Lavc58.134.100 pcm_s16le
size= 57357kB time=00:05:05.88 bitrate=1536.1kbits/s speed= 674x
video:0kB audio:57357kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000286%
(80, 24473)
Length of mel chunks: 9136
0%| | 0/72 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/Users/mfaiz/Desktop/Wav2Lip-master/inference.py", line 280, in
The checkpoint_path must be filled with the wav2lip.pth file. The s3fd.pth file should be just stored in face_detection/detection/sfd/s3fd.pth
. I am not sure if the code runs on python 3.9. We have tested it on python3.5 and 3.7. Can you re-download the s3fd.pth file once again.
command used: python3 inference.py --checkpoint_path face_detection/detection/sfd/s3fd.pth --face anwar-test.mp4 --audio celine-song.mp3
outpu: Using cpu for inference. Reading video frames... Number of frames available for inference: 260 Extracting raw audio... ffmpeg version 4.4 Copyright (c) 2000-2021 the FFmpeg developers built with Apple clang version 12.0.0 (clang-1200.0.32.29) configuration: --prefix=/usr/local/Cellar/ffmpeg/4.4_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100 [mp3 @ 0x7f815b00e800] Estimating duration from bitrate, this may be inaccurate Input #0, mp3, from 'celine-song.mp3': Metadata: album : Celine Dion - ALL BY MYSELF+ title : Celine Dion - ALL BY MYSELF+ encoder : Lavf58.26.101 date : 2011 Duration: 00:05:05.90, start: 0.000000, bitrate: 320 kb/s Stream #0:0: Audio: mp3, 48000 Hz, stereo, fltp, 320 kb/s Stream #0:1: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 640x360 [SAR 96:96 DAR 16:9], 90k tbr, 90k tbn, 90k tbc (attached pic) Metadata: comment : Other Stream mapping: Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native)) Press [q] to stop, [?] for help Output #0, wav, to 'temp/temp.wav': Metadata: IPRD : Celine Dion - ALL BY MYSELF+ INAM : Celine Dion - ALL BY MYSELF+ ICRD : 2011 ISFT : Lavf58.76.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s Metadata: encoder : Lavc58.134.100 pcm_s16le size= 57357kB time=00:05:05.88 bitrate=1536.1kbits/s speed= 626x
main()
File "/Users/mfaiz/Desktop/Wav2Lip-master/inference.py", line 249, in main
for i, (img_batch, mel_batch, frames, coords) in enumerate(tqdm(gen,
File "/usr/local/lib/python3.9/site-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/Users/mfaiz/Desktop/Wav2Lip-master/inference.py", line 113, in datagen
face_det_results = face_detect(frames) # BGR2RGB for CNN face detection
File "/Users/mfaiz/Desktop/Wav2Lip-master/inference.py", line 69, in face_detect
detector = face_detection.FaceAlignment(face_detection.LandmarksType._2D,
File "/Users/mfaiz/Desktop/Wav2Lip-master/face_detection/api.py", line 62, in init
self.face_detector = face_detector_module.FaceDetector(device=device, verbose=verbose)
File "/Users/mfaiz/Desktop/Wav2Lip-master/face_detection/detection/sfd/sfd_detector.py", line 24, in init
model_weights = torch.load(path_to_detector)
File "/usr/local/lib/python3.9/site-packages/torch/serialization.py", line 593, in load
return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args)
File "/usr/local/lib/python3.9/site-packages/torch/serialization.py", line 762, in _legacy_load
magic_number = pickle_module.load(f, pickle_load_args)
_pickle.UnpicklingError: unpickling stack underflow
video:0kB audio:57357kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000286% (80, 24473) Length of mel chunks: 9136 0%|
| 0/72 [00:00<?, ?it/s] Traceback (most recent call last): File "/Users/mfaiz/Desktop/Wav2Lip-master/inference.py", line 280, in