ManimCommunity / manim-voiceover

Manim plugin for all things voiceover
https://voiceover.manim.community/en/stable
MIT License
186 stars 25 forks source link

Using RecorderService manim-voiceover hangs (before or after releasing the 'r' key?) #56

Open azampa opened 1 year ago

azampa commented 1 year ago

Description of bug / unexpected behavior

I render a file with CoquiService and both manim and manim-voicover work correctly. Then I pass to RecorderService, I select the input device (e.g. 13 - default) and start recording pressing the 'r' key. When I release the 'r' key manim-voiceover hangs as if lost in an infinite idle cycle. In fact, inspecting folder ./media/voiceovers I see that no file has been produced, therefore I suspect that manim-voiceover hangs waiting the user to press 'r'...

Expected behavior

Manim-voiceover should start recording the voiceover as soon as the user presses 'r'. After releasing the 'r' key manim-voiceover should ask to choose from the following options:

How to reproduce the issue

Code for reproducing the problem ```py from manim import * from manim_voiceover import VoiceoverScene # from manim_voiceover.services.gtts import GTTSService # from manim_voiceover.services.coqui import CoquiService from manim_voiceover.services.recorder import RecorderService from math import * class RecordVoiceover(VoiceoverScene): def construct(self): circle = Circle() # self.set_speech_service(GTTSService(lang='it',tld='it',transcription_model='base')) self.set_speech_service(RecorderService()) # self.set_speech_service(CoquiService( # model_name='tts_models/it/mai_male/glow-tts', # transcription_model='base' # ) # ) with self.voiceover( text='''Ora creo un cerchio, poi lo muovo a destra e infine lo elimino. ''') as tracker: self.wait_until_bookmark('A') self.play(Create(circle,run_time=0.5)) self.wait_until_bookmark('B') self.play(circle.animate.shift(RIGHT),run_time=0.5) self.wait_until_bookmark('C') self.play(FadeOut(circle),run_time=0.5) ```

Additional media files

Images/GIFs

Logs

Terminal output ``` PASTE HERE OR PROVIDE LINK TO https://pastebin.com/ OR SIMILAR ```

System specifications

System Details - OS (with version, e.g., Windows 10 v2004 or macOS 10.15 (Catalina)): Linux Ubuntu 23.04 - RAM: 32 GB - Python version (`python/py/python3 --version`): 3.10.11 - Installed modules (provide output from `pip list`): ``` Package Version ------------------------------ ------------ accelerate 0.19.0 aiohttp 3.8.4 aiosignal 1.3.1 anyascii 0.3.2 appdirs 1.4.4 async-timeout 4.0.2 attrs 23.1.0 audioread 3.0.0 azure-cognitiveservices-speech 1.29.0 Babel 2.12.1 backports.cached-property 1.0.2 bangla 0.0.2 blinker 1.6.2 bnnumerizer 0.0.2 bnunicodenormalizer 0.1.1 boltons 23.0.0 brotlipy 0.7.0 build 0.10.0 CacheControl 0.12.11 certifi 2023.5.7 cffi 1.15.1 charset-normalizer 3.1.0 clean-fid 0.1.35 cleo 2.0.1 click 8.1.3 click-default-group 1.2.2 clip-anytorch 2.5.2 cloup 0.13.1 cmake 3.26.3 colorama 0.4.6 colour 0.1.5 contourpy 1.0.7 coqpit 0.0.17 crashtest 0.4.1 cryptography 41.0.1 cycler 0.11.0 Cython 0.29.28 dataclasses 0.8 dateparser 1.1.8 decorator 5.1.1 deepl 1.14.0 distlib 0.3.6 docker-pycreds 0.4.0 docopt 0.6.2 dulwich 0.21.5 einops 0.6.1 evdev 1.6.1 ffmpeg-python 0.2.0 filelock 3.12.0 Flask 2.3.2 fonttools 4.39.4 frozenlist 1.3.3 fsspec 2023.5.0 ftfy 6.1.1 future 0.18.3 g2pkk 0.1.2 gitdb 4.0.10 GitPython 3.1.31 glcontext 2.3.7 gruut 2.2.3 gruut-ipa 0.13.0 gruut-lang-de 2.0.0 gruut-lang-en 2.0.0 gruut-lang-es 2.0.0 gruut-lang-fr 2.0.2 gTTS 2.3.2 html5lib 1.1 huggingface-hub 0.15.1 idna 3.4 imageio 2.31.0 importlib-metadata 6.6.0 importlib-resources 5.12.0 inflect 5.6.0 installer 0.7.0 isosurfaces 0.1.0 itsdangerous 2.1.2 jamo 0.4.1 jaraco.classes 3.2.3 jeepney 0.8.0 jieba 0.42.1 Jinja2 3.1.2 joblib 1.2.0 jsonlines 1.2.0 jsonmerge 1.9.0 jsonschema 4.17.3 k-diffusion 0.0.15 keyring 23.13.1 kiwisolver 1.4.4 kornia 0.6.12 lazy_loader 0.2 librosa 0.10.0.post2 lit 16.0.5.post0 llvmlite 0.39.1 lockfile 0.12.2 manim 0.17.3 manim-voiceover 0.3.3.post0 ManimPango 0.4.3 mapbox-earcut 1.0.0 markdown-it-py 2.2.0 MarkupSafe 2.1.3 matplotlib 3.7.1 mdurl 0.1.0 mecab-python3 1.0.5 moderngl 5.8.2 moderngl-window 2.4.1 more-itertools 9.1.0 mpmath 1.3.0 msgpack 1.0.5 multidict 6.0.4 multipledispatch 0.6.0 mutagen 1.46.0 networkx 2.8.8 nltk 3.8.1 num2words 0.5.12 numba 0.56.4 numpy 1.23.5 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.2.10.91 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusparse-cu11 11.7.4.91 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 openai-whisper 20230314 packaging 23.1 pandas 2.0.2 pathtools 0.1.2 pexpect 4.8.0 Pillow 9.5.0 pip 23.1.2 pkginfo 1.9.6 pkgutil_resolve_name 1.3.10 platformdirs 3.5.1 poetry 1.5.1 poetry-core 1.6.1 poetry-plugin-export 1.4.0 pooch 1.6.0 protobuf 3.19.6 psutil 5.9.5 ptyprocess 0.7.0 PyAudio 0.2.13 pycairo 1.23.0 pycparser 2.21 pydub 0.25.1 pyglet 1.5.27 Pygments 2.15.1 pynndescent 0.5.10 pynput 1.7.6 pyOpenSSL 23.2.0 pyparsing 3.0.9 pypinyin 0.49.0 pyproject_hooks 1.0.0 pyrr 0.10.3 pyrsistent 0.19.3 pysbd 0.3.4 PySocks 1.7.1 python-crfsuite 0.9.9 python-dateutil 2.8.2 python-dotenv 0.21.1 python-slugify 8.0.1 python-xlib 0.33 pyttsx3 2.90 pytz 2023.3 PyWavelets 1.4.1 PyYAML 6.0 rapidfuzz 2.15.1 regex 2023.6.3 requests 2.31.0 requests-toolbelt 1.0.0 resize-right 0.0.2 rich 13.4.1 scikit-image 0.21.0 scikit-learn 1.2.2 scipy 1.10.1 screeninfo 0.8.1 SecretStorage 3.3.3 sentry-sdk 1.25.0 setproctitle 1.3.2 setuptools 67.7.2 shellingham 1.5.1 six 1.16.0 skia-pathops 0.7.4 smmap 5.0.0 soundfile 0.12.1 sox 1.4.1 soxr 0.3.5 srt 3.5.2 stable-ts 2.6.2 svgelements 1.9.5 sympy 1.12 tensorboardX 2.6 text-unidecode 1.3 threadpoolctl 3.1.0 tifffile 2023.4.12 tiktoken 0.3.1 tokenizers 0.13.3 tomli 2.0.1 tomlkit 0.11.8 torch 2.0.1 torchaudio 2.0.2 torchdiffeq 0.2.3 torchsde 0.2.5 torchvision 0.15.2 tqdm 4.65.0 trainer 0.0.20 trampoline 0.1.2 transformers 4.29.2 triton 2.0.0 trove-classifiers 2023.5.24 TTS 0.14.3 typing_extensions 4.6.3 tzdata 2023.3 tzlocal 5.0.1 umap-learn 0.5.1 unidic-lite 1.0.8 urllib3 1.26.15 virtualenv 20.23.0 wandb 0.15.4 watchdog 2.2.1 wcwidth 0.2.6 webencodings 0.5.1 Werkzeug 2.3.4 wheel 0.40.0 yarl 1.9.2 zipp 3.15.0 ```
LaTeX details + LaTeX distribution (e.g. TeX Live 2020): TeX Live 2022/Debian + Installed LaTeX packages:
FFMPEG Output of `ffmpeg -version`: ``` ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0) configuration: --prefix=/tmp/build/80754af9/ffmpeg_1587154242452/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho --cc=/tmp/build/80754af9/ffmpeg_1587154242452/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --enable-avresample --enable-gmp --enable-hardcoded-tables --enable-libfreetype --enable-libvpx --enable-pthreads --enable-libopus --enable-postproc --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame --disable-nonfree --enable-gpl --enable-gnutls --disable-openssl --enable-libopenh264 --enable-libx264 libavutil 56. 31.100 / 56. 31.100 libavcodec 58. 54.100 / 58. 54.100 libavformat 58. 29.100 / 58. 29.100 libavdevice 58. 8.100 / 58. 8.100 libavfilter 7. 57.100 / 7. 57.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 5.100 / 5. 5.100 libswresample 3. 5.100 / 3. 5.100 libpostproc 55. 5.100 / 55. 5.100 ```

Additional comments

osolmaz commented 1 year ago

I can't reproduce this locally. Can you maybe insert some breakpoints (import ipdb; ipdb.set_trace()) to manim-voiceover source locally and tell me which line causes the hang?

azampa commented 1 year ago

Well, doing like you asked I determined that manim-voiceover hangs in services/recorder/utility.py between line 160 (reached) and line 163 (never reached). It reaches line 179, therefore it seems that MyListener() is not able to recognise pression of 'r' key. i.e. that _self.listener.keypressed is always false!

azampa commented 1 year ago

Doing more debugging I determined that _MyListener.onpress() is never called (the listener gets initialised and started, though), therefore there is no chance for _keypressed to become True, and its value remains None! Of course, I don't understand the reason for this behaviour...

osolmaz commented 1 year ago

Thank you, I’ll investigate this sooon

azampa commented 1 year ago

Ok, I found the origin of the issue: the fact is that pynput is not meant to work on Wayland but only on Xorg. When I switched to Ubuntu on Xorg all worked as expected.

Knowing this, you should either search for an alternative to pynput that works also on Wayland (such as this which, unfortunately, is currently unmaintained), or warn users to avoid Wayland when using manim-voiceover on Ubuntu...

osolmaz commented 1 year ago

Then this might relate to #44, you can follow the discussion there. (Note that I had issues with Gradio, and that's why I didn't merge until now) The CLI based approach was a quick hack to get the MVP going, and the ideal solution would be a standalone UI that works without cross platform compatilibility issues.

I found the following options:

https://github.com/PySimpleGUI/PySimpleGUI https://github.com/hoffstadt/DearPyGui https://github.com/beeware/toga

We could also go for an Electron app or locally hosted web app like Jupyter Notebook but then it would be more complicated to ship Python and JS code in the same package, albeit more future-proof. Open to suggestions. cc @o-alexandre-felipe