ManimCommunity / manim-voiceover

Manim plugin for all things voiceover
https://voiceover.manim.community/en/stable
MIT License
154 stars 20 forks source link

CouldntEncodeError: Encoding failed #46

Open slopezpereyra opened 1 year ago

slopezpereyra commented 1 year ago

Description of bug / unexpected behavior

I took the following example from the VoiceOver Website:


class MyScene(VoiceoverScene):

    def construct(self):
        self.set_speech_service(RecorderService( ))
        with self.voiceover(text="This circle is drawn as I speak.") as tracker:
            self.play(Create(circle), run_time=tracker.duration))

I then ran manim -pqh myfile.py MyScene --disable_caching. I was requested to chose from which input device to record. I chose "default" (13). I recorded my voice as instructed, holding the 'r' key.

Upon finishing my recording, the following message appeared on the console:

Finished recording, saving to media/voiceovers/alaska-venus-montana-robin.mp3
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim/cli/render/commands.py:115 in render                                                  │
│                                                                                                  │
│   112 │   │   │   try:                                                                           │
│   113 │   │   │   │   with tempconfig({}):                                                       │
│   114 │   │   │   │   │   scene = SceneClass()                                                   │
│ ❱ 115 │   │   │   │   │   scene.render()                                                         │
│   116 │   │   │   except Exception:                                                              │
│   117 │   │   │   │   error_console.print_exception()                                            │
│   118 │   │   │   │   sys.exit(1)                                                                │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim/scene/scene.py:223 in render                                                          │
│                                                                                                  │
│    220 │   │   """                                                                               │
│    221 │   │   self.setup()                                                                      │
│    222 │   │   try:                                                                              │
│ ❱  223 │   │   │   self.construct()                                                              │
│    224 │   │   except EndSceneEarlyException:                                                    │
│    225 │   │   │   pass                                                                          │
│    226 │   │   except RerunSceneException as e:                                                  │
│                                                                                                  │
│ /home/santiago/repos/manim/intro.py:29 in construct                                              │
│                                                                                                  │
│    26 │                                                                                          │
│    27 │   def construct(self):                                                                   │
│    28 │   │   self.set_speech_service(RecorderService(format=1, channels=128, chunk=1024, tran   │
│ ❱  29 │   │   with self.voiceover(text="This circle is drawn as I speak.") as tracker:           │
│    30 │   │   │   self.play(Create(circle), run_time=tracker.duration)                           │
│    31 │   │   v = [r"\{a\}", r"\{b\}", r"\{a, b\}", r"\{a, b, c\}", r"\{a, b, c, f, g\}",        │
│    32 │   │   │    r"\{f\}", r"\{f, g\}"]                                                        │
│                                                                                                  │
│ /usr/lib/python3.11/contextlib.py:137 in __enter__                                               │
│                                                                                                  │
│   134 │   │   # they are only needed for recreation, which is not possible anymore               │
│   135 │   │   del self.args, self.kwds, self.func                                                │
│   136 │   │   try:                                                                               │
│ ❱ 137 │   │   │   return next(self.gen)                                                          │
│   138 │   │   except StopIteration:                                                              │
│   139 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   140                                                                                            │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/voiceover_scene.py:180 in voiceover                                         │
│                                                                                                  │
│   177 │   │                                                                                      │
│   178 │   │   try:                                                                               │
│   179 │   │   │   if text is not None:                                                           │
│ ❱ 180 │   │   │   │   yield self.add_voiceover_text(text, **kwargs)                              │
│   181 │   │   │   elif ssml is not None:                                                         │
│   182 │   │   │   │   yield self.add_voiceover_ssml(ssml, **kwargs)                              │
│   183 │   │   finally:                                                                           │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/voiceover_scene.py:63 in add_voiceover_text                                 │
│                                                                                                  │
│    60 │   │   │   │   "You need to call init_voiceover() before adding a voiceover."             │
│    61 │   │   │   )                                                                              │
│    62 │   │                                                                                      │
│ ❱  63 │   │   dict_ = self.speech_service._wrap_generate_from_text(text, **kwargs)               │
│    64 │   │   tracker = VoiceoverTracker(self, dict_, self.speech_service.cache_dir)             │
│    65 │   │   self.add_sound(str(Path(self.speech_service.cache_dir) / dict_["final_audio"]))    │
│    66 │   │   self.current_tracker = tracker                                                     │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/services/base.py:85 in _wrap_generate_from_text                             │
│                                                                                                  │
│    82 │   │   # Replace newlines with lines, reduce multiple consecutive spaces to single        │
│    83 │   │   text = " ".join(text.split())                                                      │
│    84 │   │                                                                                      │
│ ❱  85 │   │   dict_ = self.generate_from_text(text, cache_dir=None, path=path, **kwargs)         │
│    86 │   │   original_audio = dict_["original_audio"]                                           │
│    87 │   │                                                                                      │
│    88 │   │   # Check whether word boundaries exist and if not run stt                           │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/services/recorder/__init__.py:101 in generate_from_text                     │
│                                                                                                  │
│    98 │   │                                                                                      │
│    99 │   │   self.recorder._trigger_set_device()                                                │
│   100 │   │   box = msg_box("Voiceover:\n\n" + input_text)                                       │
│ ❱ 101 │   │   self.recorder.record(str(Path(cache_dir) / audio_path), box)                       │
│   102 │   │                                                                                      │
│   103 │   │   json_dict = {                                                                      │
│   104 │   │   │   "input_text": text,                                                            │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/services/recorder/utility.py:225 in record                                  │
│                                                                                                  │
│   222 │   def record(self, path: str, message: str = None):                                      │
│   223 │   │   if message is not None:                                                            │
│   224 │   │   │   print(message)                                                                 │
│ ❱ 225 │   │   self._record(path)                                                                 │
│   226 │   │                                                                                      │
│   227 │   │   while True:                                                                        │
│   228 │   │   │   print(                                                                         │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/services/recorder/utility.py:110 in _record                                 │
│                                                                                                  │
│   107 │   │   self.event = self.task.enter(                                                      │
│   108 │   │   │   self.callback_delay, 1, self._record_task, ([path])                            │
│   109 │   │   )                                                                                  │
│ ❱ 110 │   │   self.task.run()                                                                    │
│   111 │   │                                                                                      │
│   112 │   │   return                                                                             │
│   113                                                                                            │
│                                                                                                  │
│ /usr/lib/python3.11/sched.py:151 in run                                                          │
│                                                                                                  │
│   148 │   │   │   │   │   return time - now                                                      │
│   149 │   │   │   │   delayfunc(time - now)                                                      │
│   150 │   │   │   else:                                                                          │
│ ❱ 151 │   │   │   │   action(*argument, **kwargs)                                                │
│   152 │   │   │   │   delayfunc(0)   # Let other threads run                                     │
│   153 │                                                                                          │
│   154 │   @property                                                                              │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/services/recorder/utility.py:208 in _record_task                            │
│                                                                                                  │
│   205 │   │   │   │   buffer_start=self.trim_buffer_start,                                       │
│   206 │   │   │   │   buffer_end=self.trim_buffer_end,                                           │
│   207 │   │   │   ).export(wav_path, format="wav")                                               │
│ ❱ 208 │   │   │   wav2mp3(wav_path)                                                              │
│   209 │   │   │                                                                                  │
│   210 │   │   │   for e in self.task._queue:                                                     │
│   211 │   │   │   │   self.task.cancel(e)                                                        │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/helper.py:31 in wav2mp3                                                     │
│                                                                                                  │
│    28 │   │   mp3_path = Path(wav_path).with_suffix(".mp3")                                      │
│    29 │                                                                                          │
│    30 │   # Convert to mp3                                                                       │
│ ❱  31 │   AudioSegment.from_wav(wav_path).export(mp3_path, format="mp3", bitrate=bitrate)        │
│    32 │                                                                                          │
│    33 │   if remove_wav:                                                                         │
│    34 │   │   # Remove the .wav file                                                             │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/pydub/audio_segment.py:970 in export                                                        │
│                                                                                                  │
│    967 │   │   log_subprocess_output(p_err)                                                      │
│    968 │   │                                                                                     │
│    969 │   │   if p.returncode != 0:                                                             │
│ ❱  970 │   │   │   raise CouldntEncodeError(                                                     │
│    971 │   │   │   │   "Encoding failed. ffmpeg/avlib returned error code: {0}\n\nCommand:{1}\n  │
│    972 │   │   │   │   │   p.returncode, conversion_command, p_err.decode(errors='ignore') ))    │
│    973                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CouldntEncodeError: Encoding failed. ffmpeg/avlib returned error code: 1

Command:['ffmpeg', '-y', '-f', 'wav', '-i', '/tmp/tmp8ska799i', '-b:a', '312k', '-f', 'mp3', '/tmp/tmpnp4_0fh8']

Output from ffmpeg/avlib:

ffmpeg version 5.1.2-3ubuntu1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 12 (Ubuntu 12.2.0-14ubuntu2)
  configuration: --prefix=/usr --extra-version=3ubuntu1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa
--enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang
--enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband
--enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp
--enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl
--enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared
  WARNING: library configuration mismatch
  avfilter    configuration: --prefix=/usr --extra-version=3ubuntu1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa
--enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang
--enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband
--enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp
--enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl
--enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared --enable-version3
--disable-doc --disable-programs --enable-libaribb24 --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libtesseract --enable-libvo_amrwbenc --enable-libsmbclient
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
Input #0, wav, from '/tmp/tmp8ska799i':
  Duration: 00:00:01.67, bitrate: 90317 kb/s
  Stream #0:0: Audio: pcm_s32le ([1][0][0][0] / 0x0001), 44100 Hz, 64 channels, s32, 90316 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_s32le (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
[auto_aresample_0 @ 0x559814942200] [SWR @ 0x559814942380] Rematrix is needed between 64 channels and stereo but there is not enough information to do it
[auto_aresample_0 @ 0x559814942200] Failed to configure output pad on auto_aresample_0
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #0:0
Conversion failed!

I run Ubuntu 23.04. All dependencies were installed and are up-to-date.

Expected behavior

After executing manim -pqh myfile.py MyScene --disable_caching and recording my voice with my (functioning) microphone, I expected the recording to be succesfully embedded to the video and an .mp4 file to be outputted with the recording.

How to reproduce the issue

Code for reproducing the problem ```py from manim import * from manim_voiceover import VoiceoverScene from manim_voiceover.services.recorder import RecorderService class MyScene(VoiceoverScene): def construct(self): self.set_speech_service(RecorderService( )) with self.voiceover(text="This circle is drawn as I speak.") as tracker: self.play(Create(circle), run_time=tracker.duration) ```

System specifications

System Details - OS: Ubuntu 23.04 - RAM: 16 GB - Python version 3.11.2 - Installed modules (provide output from `pip list`): ``` Package Version ------------------------------ ---------- azure-cognitiveservices-speech 1.28.0 build 0.10.0 certifi 2022.12.7 charset-normalizer 3.1.0 click 8.1.3 click-default-group 1.2.2 cloup 0.13.1 cmake 3.26.3 colour 0.1.5 decorator 5.1.1 docstring-to-markdown 0.12 evdev 1.6.1 ffmpeg-python 0.2.0 filelock 3.12.0 fsspec 2023.4.0 future 0.18.3 glcontext 2.3.7 greenlet 2.0.2 gTTS 2.3.2 huggingface-hub 0.14.1 humanhash3 0.0.6 idna 3.4 isosurfaces 0.1.0 jedi 0.17.2 Jinja2 3.1.2 lit 16.0.2 llvmlite 0.40.0 manim 0.17.3 manim-voiceover 0.3.0 ManimPango 0.4.3 mapbox-earcut 1.0.1 markdown-it-py 2.2.0 MarkupSafe 2.1.2 mdurl 0.1.2 moderngl 5.8.2 moderngl-window 2.4.3 more-itertools 9.1.0 mpmath 1.3.0 msgpack 1.0.5 multipledispatch 0.6.0 mutagen 1.46.0 networkx 2.8.8 numba 0.57.0 numpy 1.24.3 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.2.10.91 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusparse-cu11 11.7.4.91 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 openai-whisper 20230314 packaging 23.1 parso 0.7.1 Pillow 9.5.0 pip 23.1.2 pip-tools 6.13.0 playsound 1.3.0 pluggy 1.0.0 PyAudio 0.2.13 pycairo 1.23.0 pydub 0.25.1 pyglet 2.0.5 Pygments 2.15.1 pynput 1.7.6 pynvim 0.4.3 pyproject_hooks 1.0.0 pyrr 0.10.3 python-dotenv 0.21.1 python-jsonrpc-server 0.4.0 python-language-server 0.36.2 python-lsp-jsonrpc 1.0.0 python-lsp-server 1.7.2 python-xlib 0.33 PyYAML 6.0 regex 2023.5.5 requests 2.29.0 rich 13.3.5 scipy 1.10.1 screeninfo 0.8.1 setuptools 66.1.1 six 1.16.0 skia-pathops 0.7.4 sox 1.4.1 srt 3.5.3 stable-ts 2.5.3 svgelements 1.9.3 sympy 1.11.1 tiktoken 0.3.1 tokenizers 0.13.3 torch 2.0.0 torchaudio 2.0.1 tqdm 4.65.0 transformers 4.28.1 triton 2.0.0 typing_extensions 4.5.0 ujson 5.7.0 urllib3 1.26.15 watchdog 2.3.1 wheel 0.40.0 ```
FFMPEG Output of `ffmpeg -version`: ``` ffmpeg version 5.1.2-3ubuntu1 Copyright (c) 2000-2022 the FFmpeg developers built with gcc 12 (Ubuntu 12.2.0-14ubuntu2) configuration: --prefix=/usr --extra-version=3ubuntu1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared libavutil 57. 28.100 / 57. 28.100 libavcodec 59. 37.100 / 59. 37.100 libavformat 59. 27.100 / 59. 27.100 libavdevice 59. 7.100 / 59. 7.100 libavfilter 8. 44.100 / 8. 44.100 libswscale 6. 7.100 / 6. 7.100 libswresample 4. 7.100 / 4. 7.100 libpostproc 56. 6.100 / 56. 6.100 ```

Aditional comments

Choosing HDA Intel PCH: ALC897 Analog or HDA Intel PCH: ALC897 Alt Analog as input devices, instead of default, did not produce the same issue. However, the recordings were of terrible quality (not a microphone issue, tested the same microphone on an online recorder and had good quality).

osolmaz commented 1 year ago

The plugin saves the recordings under media/voiceovers/. Can you play the ones you recorded with "default" device? They will properly not play properly with a media player, but just trying to narrow the problem down.