Robitx / gp.nvim

Gp.nvim (GPT prompt) Neovim AI plugin: ChatGPT sessions & Instructable text/code operations & Speech to text [OpenAI]
MIT License
537 stars 49 forks source link

my adventures with GpWhisper: log to files the different commands ? #122

Open teto opened 3 months ago

teto commented 3 months ago

Now that I have my GPU used by localai I wanted to try whisper locally via :GpWhisper after installing sox and I got a not very helpful:

Gp: Whisper query exited: 2, 0

I had installed sox because checkhealth asked for it:

- OK sox is installed
- OK sox is compiled with mp3 support

Note that the mp3 check is invalid as sox -h | grep -i mp3 did return mp3 but there seems to be a dinstinction between writing and reading mp3 https://bugs.launchpad.net/ubuntu/+source/sox/+bug/223783 I am on nix and I had to install (sox.override({enableLame = true;})) for sox to be able to generate mp3.

In oder to debug my setup, I print-ed stuff, would be nice if gp.nvim could log some of its operations to a file instead. I dont like plenary much but it has some facilities. With package managers like https://github.com/nvim-neorocks/rocks.nvim/ , it should become more tractable to use dependencies in the future.

So anyway GpWhisper was trying to run:

|| cd /tmp/gp_whisper && export LC_NUMERIC='C' && sox --norm=-3 rec.wav norm.wav && t=$(sox 'norm.wav' -n channels 1 stats 2>&1 | grep 'RMS lev dB'  | sed -e 's/.* //' | awk '{print $1*1.75}') && sox -q norm.wav -C 196.5 final.mp3 silence -l 1 0.05 $t'dB' -1 1.0 $t'dB' pad 0.1 0.1 tempo 1.75 && curl  --max-time 20 https://api.openai.com/v1/audio/transcriptions -s -H "Authorization: Bearer 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" -H "Content-Type: multipart/form-data" -F model="whisper-1" -F language="en" -F file="@final.mp3" -F response_format="json"
|| Gp: Whisper query exited: 2, 0

So I found out that rec.wav did not exist/was empty. Checking for the size of the record could help diagnose wrong recording. Then I had to split the command to find the issue. Turns out that the conversion to mp3 failed because of what I mentioned earlier: my version of sox listed mp3 in sox -h but it was not able to generate mp3 until I enabled the "lame" library.

So now it works (yeah \o/) but initially I wanted to try it locally so I changed the hardcoded endpoint towards my local localai endpoint .. " --max-time 20 http://localhost:11111/v1/audio/transcriptions -s " and it works so fast it's scary (with a RTX3060, so no that fancy)

My first attempt was in my native language != English and the result was garbage ^^ maybe thedefault `whisper_language = "en" could be chosen via the locale instead ? but I nitpick. Took me a few (2?) hours to get there so I'll pause for now :)

My USB mic needed some custom config that I am listing more for my future self than for the maintainers (sry ^^'):

$ arecord -l                                                                                         
**** List of CAPTURE Hardware Devices ****
card 1: PCH [HDA Intel PCH], device 0: ALC892 Analog [ALC892 Analog]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 1: PCH [HDA Intel PCH], device 2: ALC892 Alt Analog [ALC892 Alt Analog]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 2: Microphones [Blue Microphones], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

The help/doc of arecord is not great so from there it was not clear how to specify the device. I found the answer here https://unix.stackexchange.com/questions/360192/alsa-error-channel-count-2-not-available-for-playback-invalid-argument : plughw accepts more options than hw it seems and in the end

arecord  "-D" "plughw:2,0" "-c" "1" "-f" "S16_LE" "-r" "48000" "-d" 3600 "/tmp/gp_whisper/rec.wav"