Rudrabha / Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
https://synclabs.so
10.4k stars 2.22k forks source link

_pickle.UnpicklingError: unpickling stack underflow occurs on running with s3fd.pth #282

Closed Anwar-Faiz closed 3 years ago

Anwar-Faiz commented 3 years ago

command used: python3 inference.py --checkpoint_path face_detection/detection/sfd/s3fd.pth --face anwar-test.mp4 --audio celine-song.mp3

outpu: Using cpu for inference. Reading video frames... Number of frames available for inference: 260 Extracting raw audio... ffmpeg version 4.4 Copyright (c) 2000-2021 the FFmpeg developers built with Apple clang version 12.0.0 (clang-1200.0.32.29) configuration: --prefix=/usr/local/Cellar/ffmpeg/4.4_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100 [mp3 @ 0x7f815b00e800] Estimating duration from bitrate, this may be inaccurate Input #0, mp3, from 'celine-song.mp3': Metadata: album : Celine Dion - ALL BY MYSELF+ title : Celine Dion - ALL BY MYSELF+ encoder : Lavf58.26.101 date : 2011 Duration: 00:05:05.90, start: 0.000000, bitrate: 320 kb/s Stream #0:0: Audio: mp3, 48000 Hz, stereo, fltp, 320 kb/s Stream #0:1: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 640x360 [SAR 96:96 DAR 16:9], 90k tbr, 90k tbn, 90k tbc (attached pic) Metadata: comment : Other Stream mapping: Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native)) Press [q] to stop, [?] for help Output #0, wav, to 'temp/temp.wav': Metadata: IPRD : Celine Dion - ALL BY MYSELF+ INAM : Celine Dion - ALL BY MYSELF+ ICRD : 2011 ISFT : Lavf58.76.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s Metadata: encoder : Lavc58.134.100 pcm_s16le size= 57357kB time=00:05:05.88 bitrate=1536.1kbits/s speed= 626x
video:0kB audio:57357kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000286% (80, 24473) Length of mel chunks: 9136 0%|
| 0/72 [00:00<?, ?it/s] Traceback (most recent call last): File "/Users/mfaiz/Desktop/Wav2Lip-master/inference.py", line 280, in main() File "/Users/mfaiz/Desktop/Wav2Lip-master/inference.py", line 249, in main for i, (img_batch, mel_batch, frames, coords) in enumerate(tqdm(gen, File "/usr/local/lib/python3.9/site-packages/tqdm/std.py", line 1178, in iter for obj in iterable: File "/Users/mfaiz/Desktop/Wav2Lip-master/inference.py", line 113, in datagen face_det_results = face_detect(frames) # BGR2RGB for CNN face detection File "/Users/mfaiz/Desktop/Wav2Lip-master/inference.py", line 69, in face_detect detector = face_detection.FaceAlignment(face_detection.LandmarksType._2D, File "/Users/mfaiz/Desktop/Wav2Lip-master/face_detection/api.py", line 62, in init self.face_detector = face_detector_module.FaceDetector(device=device, verbose=verbose) File "/Users/mfaiz/Desktop/Wav2Lip-master/face_detection/detection/sfd/sfd_detector.py", line 24, in init model_weights = torch.load(path_to_detector) File "/usr/local/lib/python3.9/site-packages/torch/serialization.py", line 593, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/usr/local/lib/python3.9/site-packages/torch/serialization.py", line 762, in _legacy_load magic_number = pickle_module.load(f, pickle_load_args) _pickle.UnpicklingError: unpickling stack underflow

Anwar-Faiz commented 3 years ago

Similar error even if i used wav2lip.pth which is for the highly accurate version. command: python3 inference.py --checkpoint_path "face_detection/detection/sfd/wav2lip.pth" --face "anwar-test.mp4" --audio "celine-song.mp3"

Error: Using cpu for inference. Reading video frames... Number of frames available for inference: 260 Extracting raw audio... ffmpeg version 4.4 Copyright (c) 2000-2021 the FFmpeg developers built with Apple clang version 12.0.0 (clang-1200.0.32.29) configuration: --prefix=/usr/local/Cellar/ffmpeg/4.4_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100 [mp3 @ 0x7fbf5780be00] Estimating duration from bitrate, this may be inaccurate Input #0, mp3, from 'celine-song.mp3': Metadata: album : Celine Dion - ALL BY MYSELF+ title : Celine Dion - ALL BY MYSELF+ encoder : Lavf58.26.101 date : 2011 Duration: 00:05:05.90, start: 0.000000, bitrate: 320 kb/s Stream #0:0: Audio: mp3, 48000 Hz, stereo, fltp, 320 kb/s Stream #0:1: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 640x360 [SAR 96:96 DAR 16:9], 90k tbr, 90k tbn, 90k tbc (attached pic) Metadata: comment : Other Stream mapping: Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native)) Press [q] to stop, [?] for help Output #0, wav, to 'temp/temp.wav': Metadata: IPRD : Celine Dion - ALL BY MYSELF+ INAM : Celine Dion - ALL BY MYSELF+ ICRD : 2011 ISFT : Lavf58.76.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s Metadata: encoder : Lavc58.134.100 pcm_s16le size= 57357kB time=00:05:05.88 bitrate=1536.1kbits/s speed= 674x
video:0kB audio:57357kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000286% (80, 24473) Length of mel chunks: 9136 0%| | 0/72 [00:00<?, ?it/s] Traceback (most recent call last): File "/Users/mfaiz/Desktop/Wav2Lip-master/inference.py", line 280, in main() File "/Users/mfaiz/Desktop/Wav2Lip-master/inference.py", line 249, in main for i, (img_batch, mel_batch, frames, coords) in enumerate(tqdm(gen, File "/usr/local/lib/python3.9/site-packages/tqdm/std.py", line 1178, in iter for obj in iterable: File "/Users/mfaiz/Desktop/Wav2Lip-master/inference.py", line 113, in datagen face_det_results = face_detect(frames) # BGR2RGB for CNN face detection File "/Users/mfaiz/Desktop/Wav2Lip-master/inference.py", line 69, in face_detect detector = face_detection.FaceAlignment(face_detection.LandmarksType._2D, File "/Users/mfaiz/Desktop/Wav2Lip-master/face_detection/api.py", line 62, in init self.face_detector = face_detector_module.FaceDetector(device=device, verbose=verbose) File "/Users/mfaiz/Desktop/Wav2Lip-master/face_detection/detection/sfd/sfd_detector.py", line 24, in init model_weights = torch.load(path_to_detector) File "/usr/local/lib/python3.9/site-packages/torch/serialization.py", line 593, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/usr/local/lib/python3.9/site-packages/torch/serialization.py", line 762, in _legacy_load magic_number = pickle_module.load(f, pickle_load_args) _pickle.UnpicklingError: unpickling stack underflow

Rudrabha commented 3 years ago

The checkpoint_path must be filled with the wav2lip.pth file. The s3fd.pth file should be just stored in face_detection/detection/sfd/s3fd.pth. I am not sure if the code runs on python 3.9. We have tested it on python3.5 and 3.7. Can you re-download the s3fd.pth file once again.