Now that I have my GPU used by localai I wanted to try whisper locally via :GpWhisper after installing sox and I got a not very helpful:
Gp: Whisper query exited: 2, 0
I had installed sox because checkhealth asked for it:
- OK sox is installed
- OK sox is compiled with mp3 support
Note that the mp3 check is invalid as
sox -h | grep -i mp3 did return mp3 but there seems to be a dinstinction between writing and reading mp3 https://bugs.launchpad.net/ubuntu/+source/sox/+bug/223783
I am on nix and I had to install (sox.override({enableLame = true;})) for sox to be able to generate mp3.
In oder to debug my setup, I print-ed stuff, would be nice if gp.nvim could log some of its operations to a file instead. I dont like plenary much but it has some facilities. With package managers like https://github.com/nvim-neorocks/rocks.nvim/ , it should become more tractable to use dependencies in the future.
So I found out that rec.wav did not exist/was empty. Checking for the size of the record could help diagnose wrong recording.
Then I had to split the command to find the issue. Turns out that the conversion to mp3 failed because of what I mentioned earlier: my version of sox listed mp3 in sox -h but it was not able to generate mp3 until I enabled the "lame" library.
So now it works (yeah \o/) but initially I wanted to try it locally so I changed the hardcoded endpoint towards my local localai endpoint
.. " --max-time 20 http://localhost:11111/v1/audio/transcriptions -s "
and it works so fast it's scary (with a RTX3060, so no that fancy)
My first attempt was in my native language != English and the result was garbage ^^
maybe thedefault `whisper_language = "en" could be chosen via the locale instead ? but I nitpick.
Took me a few (2?) hours to get there so I'll pause for now :)
My USB mic needed some custom config that I am listing more for my future self than for the maintainers (sry ^^'):
$ arecord -l
**** List of CAPTURE Hardware Devices ****
card 1: PCH [HDA Intel PCH], device 0: ALC892 Analog [ALC892 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 1: PCH [HDA Intel PCH], device 2: ALC892 Alt Analog [ALC892 Alt Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 2: Microphones [Blue Microphones], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0
Now that I have my GPU used by localai I wanted to try whisper locally via
:GpWhisper
after installing sox and I got a not very helpful:I had installed sox because checkhealth asked for it:
Note that the mp3 check is invalid as
sox -h | grep -i mp3
did return mp3 but there seems to be a dinstinction between writing and reading mp3 https://bugs.launchpad.net/ubuntu/+source/sox/+bug/223783 I am on nix and I had to install(sox.override({enableLame = true;}))
for sox to be able to generate mp3.In oder to debug my setup, I
print
-ed stuff, would be nice if gp.nvim could log some of its operations to a file instead. I dont like plenary much but it has some facilities. With package managers like https://github.com/nvim-neorocks/rocks.nvim/ , it should become more tractable to use dependencies in the future.So anyway GpWhisper was trying to run:
So I found out that rec.wav did not exist/was empty. Checking for the size of the record could help diagnose wrong recording. Then I had to split the command to find the issue. Turns out that the conversion to mp3 failed because of what I mentioned earlier: my version of sox listed mp3 in
sox -h
but it was not able to generate mp3 until I enabled the "lame" library.So now it works (yeah \o/) but initially I wanted to try it locally so I changed the hardcoded endpoint towards my local localai endpoint
.. " --max-time 20 http://localhost:11111/v1/audio/transcriptions -s "
and it works so fast it's scary (with a RTX3060, so no that fancy)My first attempt was in my native language != English and the result was garbage ^^ maybe thedefault `whisper_language = "en" could be chosen via the locale instead ? but I nitpick. Took me a few (2?) hours to get there so I'll pause for now :)
My USB mic needed some custom config that I am listing more for my future self than for the maintainers (sry ^^'):
The help/doc of arecord is not great so from there it was not clear how to specify the device. I found the answer here https://unix.stackexchange.com/questions/360192/alsa-error-channel-count-2-not-available-for-playback-invalid-argument :
plughw
accepts more options thanhw
it seems and in the end