Type error when using RecorderService

rehnertz commented 1 year ago

Description of bug / unexpected behavior

After recording my voice successfully, I encounter this error:

/home/rehnertz/manim/lib/python3.10/site-packages/stable_whisper/whisper_word_level.py:190: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detected language: english
100%|██████████████████████████████████████████████████████████████████████████████| 0.7/0.7 [00:03<00:00,  4.87s/sec]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/rehnertz/manim/lib/python3.10/site-packages/manim/cli/render/commands.py:115 in render     │
│                                                                                                  │
│   112 │   │   │   try:                                                                           │
│   113 │   │   │   │   with tempconfig({}):                                                       │
│   114 │   │   │   │   │   scene = SceneClass()                                                   │
│ ❱ 115 │   │   │   │   │   scene.render()                                                         │
│   116 │   │   │   except Exception:                                                              │
│   117 │   │   │   │   error_console.print_exception()                                            │
│   118 │   │   │   │   sys.exit(1)                                                                │
│                                                                                                  │
│ /home/rehnertz/manim/lib/python3.10/site-packages/manim/scene/scene.py:223 in render             │
│                                                                                                  │
│    220 │   │   """                                                                               │
│    221 │   │   self.setup()                                                                      │
│    222 │   │   try:                                                                              │
│ ❱  223 │   │   │   self.construct()                                                              │
│    224 │   │   except EndSceneEarlyException:                                                    │
│    225 │   │   │   pass                                                                          │
│    226 │   │   except RerunSceneException as e:                                                  │
│                                                                                                  │
│ /home/rehnertz/manim/scenes/Demo.py:10 in construct                                              │
│                                                                                                  │
│    7 │   def construct(self):                                                                    │
│    8 │   │   self.set_speech_service(RecorderService())                                          │
│    9 │   │                                                                                       │
│ ❱ 10 │   │   with self.voiceover(text="Test") as tracker:                                        │
│   11 │   │   │   self.play(Create(Circle()), run_time=tracker.duration)                          │
│   12 │   │   self.wait(1)                                                                        │
│   13                                                                                             │
│                                                                                                  │
│ /usr/lib/python3.10/contextlib.py:135 in __enter__                                               │
│                                                                                                  │
│   132 │   │   # they are only needed for recreation, which is not possible anymore               │
│   133 │   │   del self.args, self.kwds, self.func                                                │
│   134 │   │   try:                                                                               │
│ ❱ 135 │   │   │   return next(self.gen)                                                          │
│   136 │   │   except StopIteration:                                                              │
│   137 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   138                                                                                            │
│                                                                                                  │
│ /home/rehnertz/manim/lib/python3.10/site-packages/manim_voiceover/voiceover_scene.py:180 in      │
│ voiceover                                                                                        │
│                                                                                                  │
│   177 │   │                                                                                      │
│   178 │   │   try:                                                                               │
│   179 │   │   │   if text is not None:                                                           │
│ ❱ 180 │   │   │   │   yield self.add_voiceover_text(text, **kwargs)                              │
│   181 │   │   │   elif ssml is not None:                                                         │
│   182 │   │   │   │   yield self.add_voiceover_ssml(ssml, **kwargs)                              │
│   183 │   │   finally:                                                                           │
│                                                                                                  │
│ /home/rehnertz/manim/lib/python3.10/site-packages/manim_voiceover/voiceover_scene.py:63 in       │
│ add_voiceover_text                                                                               │
│                                                                                                  │
│    60 │   │   │   │   "You need to call init_voiceover() before adding a voiceover."             │
│    61 │   │   │   )                                                                              │
│    62 │   │                                                                                      │
│ ❱  63 │   │   dict_ = self.speech_service._wrap_generate_from_text(text, **kwargs)               │
│    64 │   │   tracker = VoiceoverTracker(self, dict_, self.speech_service.cache_dir)             │
│    65 │   │   self.add_sound(str(Path(self.speech_service.cache_dir) / dict_["final_audio"]))    │
│    66 │   │   self.current_tracker = tracker                                                     │
│                                                                                                  │
│ /home/rehnertz/manim/lib/python3.10/site-packages/manim_voiceover/services/base.py:93 in         │
│ _wrap_generate_from_text                                                                         │
│                                                                                                  │
│    90 │   │   │   transcription_result = self._whisper_model.transcribe(                         │
│    91 │   │   │   │   str(Path(self.cache_dir) / original_audio), **self.transcription_kwargs    │
│    92 │   │   │   )                                                                              │
│ ❱  93 │   │   │   logger.info("Transcription: " + transcription_result["text"])                  │
│    94 │   │   │   word_boundaries = timestamps_to_word_boundaries(                               │
│    95 │   │   │   │   transcription_result["segments"]                                           │
│    96 │   │   │   )                                                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: 'WhisperResult' object is not subscriptable

It seems to be a type error due to the change of Whisper. I tried to modify manim_voiceover/services/base.py to

def timestamps_to_word_boundaries(segments):
    word_boundaries = []
    current_text_offset = 0
    for segment in segments:
        # ===== MODIFIEED: Whisper 结构变化
        # for dict_ in segment["word_timestamps"]:
        for dict_ in segment["words"]:                  # <====== Key modified
        # =====
            word = dict_["word"]
            word_boundaries.append(
                {
                    # ===== MODIFIEED: Whisper 结构变化
                    # "audio_offset": int(dict_["timestamp"] * AUDIO_OFFSET_RESOLUTION),
                    "audio_offset": int(dict_["start"] * AUDIO_OFFSET_RESOLUTION),     # <====== Key modified
                    # =====
      ................

    def _wrap_generate_from_text(self, text: str, path: str = None, **kwargs) -> dict:
        # Replace newlines with lines, reduce multiple consecutive spaces to single
        text = " ".join(text.split())

        dict_ = self.generate_from_text(text, cache_dir=None, path=path, **kwargs)
        original_audio = dict_["original_audio"]

        # Check whether word boundaries exist and if not run stt
        if "word_boundaries" not in dict_ and self._whisper_model is not None:
            transcription_result = self._whisper_model.transcribe(
                str(Path(self.cache_dir) / original_audio), **self.transcription_kwargs
            )
            # ==== MODIFIED: whisper 结构变化
            transcription_result = transcription_result.ori_dict   # <====== Use original data(?)
            # ====
      ...........................

It seems to work.

Expected behavior

Successfully output the video with recorded voice.

How to reproduce the issue

Code for reproducing the problem

```py from manim import * from manim_voiceover import VoiceoverScene from manim_voiceover.services.gtts import GTTSService from manim_voiceover.services.recorder import RecorderService class Demo(VoiceoverScene): def construct(self): self.set_speech_service(RecorderService()) with self.voiceover(text="Test") as tracker: self.play(Create(Circle()), run_time=tracker.duration) self.wait(1) ``` Then call ``` manim -pql Demo.py --disable_caching ```

Additional media files

Images/GIFs

Logs

Terminal output

``` manim -v DEBUG scenes/Demo.py --disable_caching Manim Community v0.17.2 ALSA lib pcm_dmix.c:1032:(snd_pcm_dmix_open) unable to open slave ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card' ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card' ALSA lib pcm_dmix.c:1032:(snd_pcm_dmix_open) unable to open slave -------------------------device list------------------------- Input Device id 0 - HDA Intel PCH: ALC256 Analog (hw:0,0) Input Device id 18 - Samson Go Mic: USB Audio (hw:2,0) Input Device id 19 - sysdefault Input Device id 21 - samplerate Input Device id 22 - speexrate Input Device id 23 - pulse Input Device id 24 - upmix Input Device id 25 - vdownmix Input Device id 26 - default ------------------------------------------------------------- Please select an input device id to record from: 18 Selected device: Samson Go Mic: USB Audio (hw:2,0) ╔════════════╗ ║ Voiceover: ║ ║ ║ ║ Test ║ ╚════════════╝ Press and hold the 'r' key to begin recording Wait for 1 second, then start speaking. Wait for at least 1 second after you finish speaking. This is to eliminate any sounds that may come from your keyboard. The silence at the beginning and end will be trimmed automatically. You can adjust this setting using the `trim_silence_threshold` argument. These instructions are only shown once. Release the 'r' key to end recording rStream active: True start Stream rrrrrrrrrrrrrrrrrrrrrrrFinished recording, saving to media/voiceovers/charlie-summer-virginia-salami.mp3 [03/29/23 04:58:48] INFO Saved media/voiceovers/charlie-summer-virginia-salami.mp3 helper.py:36 Press... l to [l]isten to the recording r to [r]e-record a to [a]ccept the recording a /home/rehnertz/manim/lib/python3.10/site-packages/stable_whisper/whisper_word_level.py:190: UserWarning: FP16 is not supported on CPU; using FP32 instead warnings.warn("FP16 is not supported on CPU; using FP32 instead") Detected language: english 100%|████████████████████████████████████████████████████████████████████████████| 0.65/0.65 [00:09<00:00, 14.56s/sec] ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/rehnertz/manim/lib/python3.10/site-packages/manim/cli/render/commands.py:115 in render │ │ │ │ 112 │ │ │ try: │ │ 113 │ │ │ │ with tempconfig({}): │ │ 114 │ │ │ │ │ scene = SceneClass() │ │ ❱ 115 │ │ │ │ │ scene.render() │ │ 116 │ │ │ except Exception: │ │ 117 │ │ │ │ error_console.print_exception() │ │ 118 │ │ │ │ sys.exit(1) │ │ │ │ /home/rehnertz/manim/lib/python3.10/site-packages/manim/scene/scene.py:223 in render │ │ │ │ 220 │ │ """ │ │ 221 │ │ self.setup() │ │ 222 │ │ try: │ │ ❱ 223 │ │ │ self.construct() │ │ 224 │ │ except EndSceneEarlyException: │ │ 225 │ │ │ pass │ │ 226 │ │ except RerunSceneException as e: │ │ │ │ /home/rehnertz/manim/scenes/Demo.py:10 in construct │ │ │ │ 7 │ def construct(self): │ │ 8 │ │ self.set_speech_service(RecorderService()) │ │ 9 │ │ │ │ ❱ 10 │ │ with self.voiceover(text="Test") as tracker: │ │ 11 │ │ │ self.play(Create(Circle()), run_time=tracker.duration) │ │ 12 │ │ self.wait(1) │ │ 13 │ │ │ │ /usr/lib/python3.10/contextlib.py:135 in __enter__ │ │ │ │ 132 │ │ # they are only needed for recreation, which is not possible anymore │ │ 133 │ │ del self.args, self.kwds, self.func │ │ 134 │ │ try: │ │ ❱ 135 │ │ │ return next(self.gen) │ │ 136 │ │ except StopIteration: │ │ 137 │ │ │ raise RuntimeError("generator didn't yield") from None │ │ 138 │ │ │ │ /home/rehnertz/manim/lib/python3.10/site-packages/manim_voiceover/voiceover_scene.py:180 in │ │ voiceover │ │ │ │ 177 │ │ │ │ 178 │ │ try: │ │ 179 │ │ │ if text is not None: │ │ ❱ 180 │ │ │ │ yield self.add_voiceover_text(text, **kwargs) │ │ 181 │ │ │ elif ssml is not None: │ │ 182 │ │ │ │ yield self.add_voiceover_ssml(ssml, **kwargs) │ │ 183 │ │ finally: │ │ │ │ /home/rehnertz/manim/lib/python3.10/site-packages/manim_voiceover/voiceover_scene.py:63 in │ │ add_voiceover_text │ │ │ │ 60 │ │ │ │ "You need to call init_voiceover() before adding a voiceover." │ │ 61 │ │ │ ) │ │ 62 │ │ │ │ ❱ 63 │ │ dict_ = self.speech_service._wrap_generate_from_text(text, **kwargs) │ │ 64 │ │ tracker = VoiceoverTracker(self, dict_, self.speech_service.cache_dir) │ │ 65 │ │ self.add_sound(str(Path(self.speech_service.cache_dir) / dict_["final_audio"])) │ │ 66 │ │ self.current_tracker = tracker │ │ │ │ /home/rehnertz/manim/lib/python3.10/site-packages/manim_voiceover/services/base.py:93 in │ │ _wrap_generate_from_text │ │ │ │ 90 │ │ │ transcription_result = self._whisper_model.transcribe( │ │ 91 │ │ │ │ str(Path(self.cache_dir) / original_audio), **self.transcription_kwargs │ │ 92 │ │ │ ) │ │ ❱ 93 │ │ │ logger.info("Transcription: " + transcription_result["text"]) │ │ 94 │ │ │ word_boundaries = timestamps_to_word_boundaries( │ │ 95 │ │ │ │ transcription_result["segments"] │ │ 96 │ │ │ ) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ TypeError: 'WhisperResult' object is not subscriptable ```

System specifications

System Details

- OS (with version, e.g., Windows 10 v2004 or macOS 10.15 (Catalina)): Ubuntu 22.04 - RAM: 16GB - Python version (`python/py/python3 --version`): 3.10.6 - Installed modules (provide output from `pip list`): ``` Package Version ------------------------ ---------- autopep8 2.0.2 certifi 2022.12.7 charset-normalizer 3.1.0 click 8.1.3 click-default-group 1.2.2 cloup 0.13.1 cmake 3.26.1 colour 0.1.5 decorator 5.1.1 evdev 1.6.1 ffmpeg-python 0.2.0 filelock 3.10.7 future 0.18.3 glcontext 2.3.7 gTTS 2.3.1 huggingface-hub 0.13.3 humanhash3 0.0.6 idna 3.4 isosurfaces 0.1.0 Jinja2 3.1.2 lit 16.0.0 llvmlite 0.39.1 manim 0.17.2 manim-voiceover 0.3.0 ManimPango 0.4.3 mapbox-earcut 1.0.1 markdown-it-py 2.2.0 MarkupSafe 2.1.2 mdurl 0.1.2 moderngl 5.8.1 moderngl-window 2.4.3 more-itertools 9.1.0 mpmath 1.3.0 multipledispatch 0.6.0 mutagen 1.46.0 networkx 2.8.8 numba 0.56.4 numpy 1.23.5 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.2.10.91 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusparse-cu11 11.7.4.91 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 openai-whisper 20230314 packaging 23.0 Pillow 9.4.0 pip 22.0.2 playsound 1.3.0 PyAudio 0.2.13 pycairo 1.23.0 pycodestyle 2.10.0 pydub 0.25.1 pyglet 2.0.5 Pygments 2.14.0 PyGObject 3.44.1 pynput 1.7.6 pyrr 0.10.3 python-dotenv 0.21.1 python-xlib 0.33 PyYAML 6.0 regex 2023.3.23 requests 2.28.2 rich 13.3.2 scipy 1.10.1 screeninfo 0.8.1 setuptools 59.6.0 six 1.16.0 skia-pathops 0.7.4 sox 1.4.1 srt 3.5.2 stable-ts 2.1.2 svgelements 1.9.1 sympy 1.11.1 tiktoken 0.3.1 tokenizers 0.13.2 tomli 2.0.1 torch 2.0.0 torchaudio 2.0.1 tqdm 4.65.0 transformers 4.27.3 triton 2.0.0 typing_extensions 4.5.0 urllib3 1.26.15 watchdog 2.3.1 wheel 0.40.0 ```

LaTeX details

+ LaTeX distribution (e.g. TeX Live 2020): + Installed LaTeX packages:

FFMPEG

Output of `ffmpeg -version`: ``` ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers built with gcc 11 (Ubuntu 11.2.0-19ubuntu1) configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100 ```

Additional comments

azampa commented 1 year ago

Hi rehnertz,

I had the same problem and your solution works also for me! Thanks

azampa commented 1 year ago

However, now I face another problem when I try to listen to the recording in order to see if it is good: I get the following error

Press...
 l to [l]isten to the recording
 r to [r]e-record
 a to [a]ccept the recording

rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrl

    Error 305 for command:
        open "media\voiceovers\iowa-oklahoma-two-coffee.mp3"
    Cannot specify extra characters after a string enclosed in quotation marks.

    Error 305 for command:
        close "media\voiceovers\iowa-oklahoma-two-coffee.mp3"
    Cannot specify extra characters after a string enclosed in quotation marks.
Failed to close the file: "media\voiceovers\iowa-oklahoma-two-coffee.mp3"
┌─────────────────────────────── Traceback (most recent call last) ────────────────────────────────┐
│ C:\Users\ale\anaconda3\envs\manimenv\lib\site-packages\manim\cli\render\commands.py:115 in       │
│ render                                                                                           │
│                                                                                                  │
│   112 │   │   │   try:                                                                           │
│   113 │   │   │   │   with tempconfig({}):                                                       │
│   114 │   │   │   │   │   scene = SceneClass()                                                   │
│ > 115 │   │   │   │   │   scene.render()                                                         │
│   116 │   │   │   except Exception:                                                              │
│   117 │   │   │   │   error_console.print_exception()                                            │
│   118 │   │   │   │   sys.exit(1)                                                                │
│                                                                                                  │
│ C:\Users\ale\anaconda3\envs\manimenv\lib\site-packages\manim\scene\scene.py:223 in render        │
│                                                                                                  │
│    220 │   │   """                                                                               │
│    221 │   │   self.setup()                                                                      │
│    222 │   │   try:                                                                              │
│ >  223 │   │   │   self.construct()                                                              │
│    224 │   │   except EndSceneEarlyException:                                                    │
│    225 │   │   │   pass                                                                          │
│    226 │   │   except RerunSceneException as e:                                                  │
│                                                                                                  │
│ C:\Users\ale\Documents\Scuola\Marinelli\2022-2023\Video\Lezioni\Geometria\Triangoli\primalezione │
│ .py:89 in construct                                                                              │
│                                                                                                  │
│    86 │   │   #                                     )                                            │
│    87 │   │   #                        )                                                         │
│    88 │   │                                                                                      │
│ >  89 │   │   with self.voiceover(                                                               │
│    90 │   │   │   │   text='''Immagina che in una zona <bookmark mark="A"/> boschiva             │
│    91 │   │   │   │   │   │   si sviluppi un <bookmark mark="B"/> incendio                       │
│    92 │   │   │   │   │   │   e che, per spegnerlo, dall’<bookmark mark="C"/>aeroporto           │
│                                                                                                  │
│ C:\Users\ale\anaconda3\envs\manimenv\lib\contextlib.py:135 in __enter__                          │
│                                                                                                  │
│   132 │   │   # they are only needed for recreation, which is not possible anymore               │
│   133 │   │   del self.args, self.kwds, self.func                                                │
│   134 │   │   try:                                                                               │
│ > 135 │   │   │   return next(self.gen)                                                          │
│   136 │   │   except StopIteration:                                                              │
│   137 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   138                                                                                            │
│                                                                                                  │
│ C:\Users\ale\anaconda3\envs\manimenv\lib\site-packages\manim_voiceover\voiceover_scene.py:180 in │
│ voiceover                                                                                        │
│                                                                                                  │
│   177 │   │                                                                                      │
│   178 │   │   try:                                                                               │
│   179 │   │   │   if text is not None:                                                           │
│ > 180 │   │   │   │   yield self.add_voiceover_text(text, **kwargs)                              │
│   181 │   │   │   elif ssml is not None:                                                         │
│   182 │   │   │   │   yield self.add_voiceover_ssml(ssml, **kwargs)                              │
│   183 │   │   finally:                                                                           │
│                                                                                                  │
│ C:\Users\ale\anaconda3\envs\manimenv\lib\site-packages\manim_voiceover\voiceover_scene.py:63 in  │
│ add_voiceover_text                                                                               │
│                                                                                                  │
│    60 │   │   │   │   "You need to call init_voiceover() before adding a voiceover."             │
│    61 │   │   │   )                                                                              │
│    62 │   │                                                                                      │
│ >  63 │   │   dict_ = self.speech_service._wrap_generate_from_text(text, **kwargs)               │
│    64 │   │   tracker = VoiceoverTracker(self, dict_, self.speech_service.cache_dir)             │
│    65 │   │   self.add_sound(str(Path(self.speech_service.cache_dir) / dict_["final_audio"]))    │
│    66 │   │   self.current_tracker = tracker                                                     │
│                                                                                                  │
│ C:\Users\ale\anaconda3\envs\manimenv\lib\site-packages\manim_voiceover\services\base.py:91 in    │
│ _wrap_generate_from_text                                                                         │
│                                                                                                  │
│    88 │   │   # Replace newlines with lines, reduce multiple consecutive spaces to single        │
│    89 │   │   text = " ".join(text.split())                                                      │
│    90 │   │                                                                                      │
│ >  91 │   │   dict_ = self.generate_from_text(text, cache_dir=None, path=path, **kwargs)         │
│    92 │   │   original_audio = dict_["original_audio"]                                           │
│    93 │   │                                                                                      │
│    94 │   │   # Check whether word boundaries exist and if not run stt                           │
│                                                                                                  │
│ C:\Users\ale\anaconda3\envs\manimenv\lib\site-packages\manim_voiceover\services\recorder\__init_ │
│ _.py:101 in generate_from_text                                                                   │
│                                                                                                  │
│    98 │   │                                                                                      │
│    99 │   │   self.recorder._trigger_set_device()                                                │
│   100 │   │   box = msg_box("Voiceover:\n\n" + input_text)                                       │
│ > 101 │   │   self.recorder.record(str(Path(cache_dir) / audio_path), box)                       │
│   102 │   │                                                                                      │
│   103 │   │   json_dict = {                                                                      │
│   104 │   │   │   "input_text": text,                                                            │
│                                                                                                  │
│ C:\Users\ale\anaconda3\envs\manimenv\lib\site-packages\manim_voiceover\services\recorder\utility │
│ .py:238 in record                                                                                │
│                                                                                                  │
│   235 │   │   │   try:                                                                           │
│   236 │   │   │   │   key = input()[-1].lower()                                                  │
│   237 │   │   │   │   if key == "l":                                                             │
│ > 238 │   │   │   │   │   playsound.playsound(path)                                              │
│   239 │   │   │   │   elif key == "r":                                                           │
│   240 │   │   │   │   │   if message is not None:                                                │
│   241 │   │   │   │   │   │   print(message)                                                     │
│                                                                                                  │
│ C:\Users\ale\anaconda3\envs\manimenv\lib\site-packages\playsound.py:72 in _playsoundWin          │
│                                                                                                  │
│    69 │                                                                                          │
│    70 │   try:                                                                                   │
│    71 │   │   logger.debug('Starting')                                                           │
│ >  72 │   │   winCommand(u'open {}'.format(sound))                                               │
│    73 │   │   winCommand(u'play {}{}'.format(sound, ' wait' if block else ''))                   │
│    74 │   │   logger.debug('Returning')                                                          │
│    75 │   finally:                                                                               │
│                                                                                                  │
│ C:\Users\ale\anaconda3\envs\manimenv\lib\site-packages\playsound.py:64 in winCommand             │
│                                                                                                  │
│    61 │   │   │   │   │   │   │   │   '\n        ' + command.decode('utf-16') +                  │
│    62 │   │   │   │   │   │   │   │   '\n    ' + errorBuffer.raw.decode('utf-16').rstrip('\0')   │
│    63 │   │   │   logger.error(exceptionMessage)                                                 │
│ >  64 │   │   │   raise PlaysoundException(exceptionMessage)                                     │
│    65 │   │   return buf.value                                                                   │
│    66 │                                                                                          │
│    67 │   if '\\' in sound:                                                                      │
└──────────────────────────────────────────────────────────────────────────────────────────────────┘
PlaysoundException:
    Error 305 for command:
        open "media\voiceovers\iowa-oklahoma-two-coffee.mp3"
    Cannot specify extra characters after a string enclosed in quotation marks.

Any hints to the solution?

ManimCommunity / manim-voiceover