Robitx / gp.nvim

Gp.nvim (GPT prompt) Neovim AI plugin: ChatGPT sessions & Instructable text/code operations & Speech to text [OpenAI, Ollama, Anthropic, ..]
MIT License
787 stars 67 forks source link

Add option to set whisper recording command #91

Closed jayeheffernan closed 8 months ago

jayeheffernan commented 8 months ago

Hello and thanks for the plugin! I had a little issue getting whisper to work, so submitting a fix your consideration.

This PR adds a way to manually specify in the config which command (sox, ffmpeg, or arecord) should be used for recording for commands like GpWhisper. E.g. use it in .setup() like whisper_rec_cmd = 'sox'.

I had an issue trying to use GpWhisper, where the output would always be just "you". I found the recordings, rec.wav, were always the correct length, but only silence. I think the problem is the options to ffmpeg select audio input device :0, which doesn't work in my case. Modifying gp/init.lua to always choose rec_cmd = "sox" works fine for me. There's probably some way to look into the audio input devices more and improve the autodetection, but I'm not sure how to do that well, and thinking that this may not be a common issue anyway.

Debugging my issue with audio devices... Here's some info from a terminal session of me figuring out what was going on, if it helps. ## Screenshot with notes tmux ## Raw text output ```txt /tmp/gp_whisper ❯ ffmpeg -devices -v quiet | grep -i avfoundation | wc -l 11:47:11 1 /tmp/gp_whisper ❯ ffmpeg -devices -v quiet | grep -i avfoundation 11:50:53 D avfoundation AVFoundation input device /tmp/gp_whisper ❯ ffmpeg -devices -v quiet 11:50:59 Devices: D. = Demuxing supported .E = Muxing supported -- E audiotoolbox AudioToolbox output device D avfoundation AVFoundation input device D lavfi Libavfilter virtual input device E sdl,sdl2 SDL2 output device D x11grab X11 screen capture, using XCB /tmp/gp_whisper ❯ ffmpeg -f avfoundation -list_devices true -i "" 11:51:10 ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers built with Apple clang version 14.0.3 (clang-1403.0.22.14.1) configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0_1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox libavutil 58. 2.100 / 58. 2.100 libavcodec 60. 3.100 / 60. 3.100 libavformat 60. 3.100 / 60. 3.100 libavdevice 60. 1.100 / 60. 1.100 libavfilter 9. 3.100 / 9. 3.100 libswscale 7. 1.100 / 7. 1.100 libswresample 4. 10.100 / 4. 10.100 libpostproc 57. 1.100 / 57. 1.100 [AVFoundation indev @ 0x7fe4d6f04a00] AVFoundation video devices: [AVFoundation indev @ 0x7fe4d6f04a00] [0] FaceTime HD Camera (Built-in) [AVFoundation indev @ 0x7fe4d6f04a00] [1] LG UltraFine Display Camera [AVFoundation indev @ 0x7fe4d6f04a00] [2] Snap Camera [AVFoundation indev @ 0x7fe4d6f04a00] [3] Capture screen 0 [AVFoundation indev @ 0x7fe4d6f04a00] [4] Capture screen 1 [AVFoundation indev @ 0x7fe4d6f04a00] [5] Capture screen 2 [AVFoundation indev @ 0x7fe4d6f04a00] AVFoundation audio devices: [AVFoundation indev @ 0x7fe4d6f04a00] [0] ZoomAudioDevice [AVFoundation indev @ 0x7fe4d6f04a00] [1] MacBook Pro Microphone [AVFoundation indev @ 0x7fe4d6f04a00] [2] LG UltraFine Display Audio ```

Screenshot of new error message in action

If you pick an invalid value, you'll find out when you try to record:

tmux

Robitx commented 8 months ago

@jayeheffernan thanks for the PR and comprehensive debug of the issue.

I've tweaked it slightly so that whisper_rec_cmd can be fully customized. If you hit cropping issues with sox (which can happen if recording sound recording device has high latency), you could go back to ffmpeg with manually chosen device.