ManimCommunity / manim-voiceover

Manim plugin for all things voiceover
https://voiceover.manim.community/en/stable
MIT License
186 stars 25 forks source link

CouldntEncodeError: Encoding failed #46

Open slopezpereyra opened 1 year ago

slopezpereyra commented 1 year ago

Description of bug / unexpected behavior

I took the following example from the VoiceOver Website:


class MyScene(VoiceoverScene):

    def construct(self):
        self.set_speech_service(RecorderService( ))
        with self.voiceover(text="This circle is drawn as I speak.") as tracker:
            self.play(Create(circle), run_time=tracker.duration))

I then ran manim -pqh myfile.py MyScene --disable_caching. I was requested to chose from which input device to record. I chose "default" (13). I recorded my voice as instructed, holding the 'r' key.

Upon finishing my recording, the following message appeared on the console:

Finished recording, saving to media/voiceovers/alaska-venus-montana-robin.mp3
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim/cli/render/commands.py:115 in render                                                  │
│                                                                                                  │
│   112 │   │   │   try:                                                                           │
│   113 │   │   │   │   with tempconfig({}):                                                       │
│   114 │   │   │   │   │   scene = SceneClass()                                                   │
│ ❱ 115 │   │   │   │   │   scene.render()                                                         │
│   116 │   │   │   except Exception:                                                              │
│   117 │   │   │   │   error_console.print_exception()                                            │
│   118 │   │   │   │   sys.exit(1)                                                                │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim/scene/scene.py:223 in render                                                          │
│                                                                                                  │
│    220 │   │   """                                                                               │
│    221 │   │   self.setup()                                                                      │
│    222 │   │   try:                                                                              │
│ ❱  223 │   │   │   self.construct()                                                              │
│    224 │   │   except EndSceneEarlyException:                                                    │
│    225 │   │   │   pass                                                                          │
│    226 │   │   except RerunSceneException as e:                                                  │
│                                                                                                  │
│ /home/santiago/repos/manim/intro.py:29 in construct                                              │
│                                                                                                  │
│    26 │                                                                                          │
│    27 │   def construct(self):                                                                   │
│    28 │   │   self.set_speech_service(RecorderService(format=1, channels=128, chunk=1024, tran   │
│ ❱  29 │   │   with self.voiceover(text="This circle is drawn as I speak.") as tracker:           │
│    30 │   │   │   self.play(Create(circle), run_time=tracker.duration)                           │
│    31 │   │   v = [r"\{a\}", r"\{b\}", r"\{a, b\}", r"\{a, b, c\}", r"\{a, b, c, f, g\}",        │
│    32 │   │   │    r"\{f\}", r"\{f, g\}"]                                                        │
│                                                                                                  │
│ /usr/lib/python3.11/contextlib.py:137 in __enter__                                               │
│                                                                                                  │
│   134 │   │   # they are only needed for recreation, which is not possible anymore               │
│   135 │   │   del self.args, self.kwds, self.func                                                │
│   136 │   │   try:                                                                               │
│ ❱ 137 │   │   │   return next(self.gen)                                                          │
│   138 │   │   except StopIteration:                                                              │
│   139 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   140                                                                                            │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/voiceover_scene.py:180 in voiceover                                         │
│                                                                                                  │
│   177 │   │                                                                                      │
│   178 │   │   try:                                                                               │
│   179 │   │   │   if text is not None:                                                           │
│ ❱ 180 │   │   │   │   yield self.add_voiceover_text(text, **kwargs)                              │
│   181 │   │   │   elif ssml is not None:                                                         │
│   182 │   │   │   │   yield self.add_voiceover_ssml(ssml, **kwargs)                              │
│   183 │   │   finally:                                                                           │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/voiceover_scene.py:63 in add_voiceover_text                                 │
│                                                                                                  │
│    60 │   │   │   │   "You need to call init_voiceover() before adding a voiceover."             │
│    61 │   │   │   )                                                                              │
│    62 │   │                                                                                      │
│ ❱  63 │   │   dict_ = self.speech_service._wrap_generate_from_text(text, **kwargs)               │
│    64 │   │   tracker = VoiceoverTracker(self, dict_, self.speech_service.cache_dir)             │
│    65 │   │   self.add_sound(str(Path(self.speech_service.cache_dir) / dict_["final_audio"]))    │
│    66 │   │   self.current_tracker = tracker                                                     │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/services/base.py:85 in _wrap_generate_from_text                             │
│                                                                                                  │
│    82 │   │   # Replace newlines with lines, reduce multiple consecutive spaces to single        │
│    83 │   │   text = " ".join(text.split())                                                      │
│    84 │   │                                                                                      │
│ ❱  85 │   │   dict_ = self.generate_from_text(text, cache_dir=None, path=path, **kwargs)         │
│    86 │   │   original_audio = dict_["original_audio"]                                           │
│    87 │   │                                                                                      │
│    88 │   │   # Check whether word boundaries exist and if not run stt                           │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/services/recorder/__init__.py:101 in generate_from_text                     │
│                                                                                                  │
│    98 │   │                                                                                      │
│    99 │   │   self.recorder._trigger_set_device()                                                │
│   100 │   │   box = msg_box("Voiceover:\n\n" + input_text)                                       │
│ ❱ 101 │   │   self.recorder.record(str(Path(cache_dir) / audio_path), box)                       │
│   102 │   │                                                                                      │
│   103 │   │   json_dict = {                                                                      │
│   104 │   │   │   "input_text": text,                                                            │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/services/recorder/utility.py:225 in record                                  │
│                                                                                                  │
│   222 │   def record(self, path: str, message: str = None):                                      │
│   223 │   │   if message is not None:                                                            │
│   224 │   │   │   print(message)                                                                 │
│ ❱ 225 │   │   self._record(path)                                                                 │
│   226 │   │                                                                                      │
│   227 │   │   while True:                                                                        │
│   228 │   │   │   print(                                                                         │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/services/recorder/utility.py:110 in _record                                 │
│                                                                                                  │
│   107 │   │   self.event = self.task.enter(                                                      │
│   108 │   │   │   self.callback_delay, 1, self._record_task, ([path])                            │
│   109 │   │   )                                                                                  │
│ ❱ 110 │   │   self.task.run()                                                                    │
│   111 │   │                                                                                      │
│   112 │   │   return                                                                             │
│   113                                                                                            │
│                                                                                                  │
│ /usr/lib/python3.11/sched.py:151 in run                                                          │
│                                                                                                  │
│   148 │   │   │   │   │   return time - now                                                      │
│   149 │   │   │   │   delayfunc(time - now)                                                      │
│   150 │   │   │   else:                                                                          │
│ ❱ 151 │   │   │   │   action(*argument, **kwargs)                                                │
│   152 │   │   │   │   delayfunc(0)   # Let other threads run                                     │
│   153 │                                                                                          │
│   154 │   @property                                                                              │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/services/recorder/utility.py:208 in _record_task                            │
│                                                                                                  │
│   205 │   │   │   │   buffer_start=self.trim_buffer_start,                                       │
│   206 │   │   │   │   buffer_end=self.trim_buffer_end,                                           │
│   207 │   │   │   ).export(wav_path, format="wav")                                               │
│ ❱ 208 │   │   │   wav2mp3(wav_path)                                                              │
│   209 │   │   │                                                                                  │
│   210 │   │   │   for e in self.task._queue:                                                     │
│   211 │   │   │   │   self.task.cancel(e)                                                        │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/helper.py:31 in wav2mp3                                                     │
│                                                                                                  │
│    28 │   │   mp3_path = Path(wav_path).with_suffix(".mp3")                                      │
│    29 │                                                                                          │
│    30 │   # Convert to mp3                                                                       │
│ ❱  31 │   AudioSegment.from_wav(wav_path).export(mp3_path, format="mp3", bitrate=bitrate)        │
│    32 │                                                                                          │
│    33 │   if remove_wav:                                                                         │
│    34 │   │   # Remove the .wav file                                                             │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/pydub/audio_segment.py:970 in export                                                        │
│                                                                                                  │
│    967 │   │   log_subprocess_output(p_err)                                                      │
│    968 │   │                                                                                     │
│    969 │   │   if p.returncode != 0:                                                             │
│ ❱  970 │   │   │   raise CouldntEncodeError(                                                     │
│    971 │   │   │   │   "Encoding failed. ffmpeg/avlib returned error code: {0}\n\nCommand:{1}\n  │
│    972 │   │   │   │   │   p.returncode, conversion_command, p_err.decode(errors='ignore') ))    │
│    973                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CouldntEncodeError: Encoding failed. ffmpeg/avlib returned error code: 1

Command:['ffmpeg', '-y', '-f', 'wav', '-i', '/tmp/tmp8ska799i', '-b:a', '312k', '-f', 'mp3', '/tmp/tmpnp4_0fh8']

Output from ffmpeg/avlib:

ffmpeg version 5.1.2-3ubuntu1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 12 (Ubuntu 12.2.0-14ubuntu2)
  configuration: --prefix=/usr --extra-version=3ubuntu1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa
--enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang
--enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband
--enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp
--enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl
--enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared
  WARNING: library configuration mismatch
  avfilter    configuration: --prefix=/usr --extra-version=3ubuntu1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa
--enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang
--enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband
--enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp
--enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl
--enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared --enable-version3
--disable-doc --disable-programs --enable-libaribb24 --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libtesseract --enable-libvo_amrwbenc --enable-libsmbclient
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
Input #0, wav, from '/tmp/tmp8ska799i':
  Duration: 00:00:01.67, bitrate: 90317 kb/s
  Stream #0:0: Audio: pcm_s32le ([1][0][0][0] / 0x0001), 44100 Hz, 64 channels, s32, 90316 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_s32le (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
[auto_aresample_0 @ 0x559814942200] [SWR @ 0x559814942380] Rematrix is needed between 64 channels and stereo but there is not enough information to do it
[auto_aresample_0 @ 0x559814942200] Failed to configure output pad on auto_aresample_0
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #0:0
Conversion failed!

I run Ubuntu 23.04. All dependencies were installed and are up-to-date.

Expected behavior

After executing manim -pqh myfile.py MyScene --disable_caching and recording my voice with my (functioning) microphone, I expected the recording to be succesfully embedded to the video and an .mp4 file to be outputted with the recording.

How to reproduce the issue

Code for reproducing the problem ```py from manim import * from manim_voiceover import VoiceoverScene from manim_voiceover.services.recorder import RecorderService class MyScene(VoiceoverScene): def construct(self): self.set_speech_service(RecorderService( )) with self.voiceover(text="This circle is drawn as I speak.") as tracker: self.play(Create(circle), run_time=tracker.duration) ```

System specifications

System Details - OS: Ubuntu 23.04 - RAM: 16 GB - Python version 3.11.2 - Installed modules (provide output from `pip list`): ``` Package Version ------------------------------ ---------- azure-cognitiveservices-speech 1.28.0 build 0.10.0 certifi 2022.12.7 charset-normalizer 3.1.0 click 8.1.3 click-default-group 1.2.2 cloup 0.13.1 cmake 3.26.3 colour 0.1.5 decorator 5.1.1 docstring-to-markdown 0.12 evdev 1.6.1 ffmpeg-python 0.2.0 filelock 3.12.0 fsspec 2023.4.0 future 0.18.3 glcontext 2.3.7 greenlet 2.0.2 gTTS 2.3.2 huggingface-hub 0.14.1 humanhash3 0.0.6 idna 3.4 isosurfaces 0.1.0 jedi 0.17.2 Jinja2 3.1.2 lit 16.0.2 llvmlite 0.40.0 manim 0.17.3 manim-voiceover 0.3.0 ManimPango 0.4.3 mapbox-earcut 1.0.1 markdown-it-py 2.2.0 MarkupSafe 2.1.2 mdurl 0.1.2 moderngl 5.8.2 moderngl-window 2.4.3 more-itertools 9.1.0 mpmath 1.3.0 msgpack 1.0.5 multipledispatch 0.6.0 mutagen 1.46.0 networkx 2.8.8 numba 0.57.0 numpy 1.24.3 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.2.10.91 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusparse-cu11 11.7.4.91 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 openai-whisper 20230314 packaging 23.1 parso 0.7.1 Pillow 9.5.0 pip 23.1.2 pip-tools 6.13.0 playsound 1.3.0 pluggy 1.0.0 PyAudio 0.2.13 pycairo 1.23.0 pydub 0.25.1 pyglet 2.0.5 Pygments 2.15.1 pynput 1.7.6 pynvim 0.4.3 pyproject_hooks 1.0.0 pyrr 0.10.3 python-dotenv 0.21.1 python-jsonrpc-server 0.4.0 python-language-server 0.36.2 python-lsp-jsonrpc 1.0.0 python-lsp-server 1.7.2 python-xlib 0.33 PyYAML 6.0 regex 2023.5.5 requests 2.29.0 rich 13.3.5 scipy 1.10.1 screeninfo 0.8.1 setuptools 66.1.1 six 1.16.0 skia-pathops 0.7.4 sox 1.4.1 srt 3.5.3 stable-ts 2.5.3 svgelements 1.9.3 sympy 1.11.1 tiktoken 0.3.1 tokenizers 0.13.3 torch 2.0.0 torchaudio 2.0.1 tqdm 4.65.0 transformers 4.28.1 triton 2.0.0 typing_extensions 4.5.0 ujson 5.7.0 urllib3 1.26.15 watchdog 2.3.1 wheel 0.40.0 ```
FFMPEG Output of `ffmpeg -version`: ``` ffmpeg version 5.1.2-3ubuntu1 Copyright (c) 2000-2022 the FFmpeg developers built with gcc 12 (Ubuntu 12.2.0-14ubuntu2) configuration: --prefix=/usr --extra-version=3ubuntu1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared libavutil 57. 28.100 / 57. 28.100 libavcodec 59. 37.100 / 59. 37.100 libavformat 59. 27.100 / 59. 27.100 libavdevice 59. 7.100 / 59. 7.100 libavfilter 8. 44.100 / 8. 44.100 libswscale 6. 7.100 / 6. 7.100 libswresample 4. 7.100 / 4. 7.100 libpostproc 56. 6.100 / 56. 6.100 ```

Aditional comments

Choosing HDA Intel PCH: ALC897 Analog or HDA Intel PCH: ALC897 Alt Analog as input devices, instead of default, did not produce the same issue. However, the recordings were of terrible quality (not a microphone issue, tested the same microphone on an online recorder and had good quality).

osolmaz commented 1 year ago

The plugin saves the recordings under media/voiceovers/. Can you play the ones you recorded with "default" device? They will properly not play properly with a media player, but just trying to narrow the problem down.

sgalkina commented 1 month ago

I have the same issue, and would like to contribute by answering that the recording is properly saved in in the voiceovers folder, the problem occurs when combining it with mp4 file