QuantiusBenignus / BlahST

Input text from speech in any Linux window, the lean, fast and accurate way, using whisper.cpp offline.
BSD 3-Clause "New" or "Revised" License
20 stars 1 forks source link

Issue with BlahST Installation and whisper.llama #2

Closed kjoetom closed 1 month ago

kjoetom commented 1 month ago

Hello,

I'm experiencing an issue with the installation of BlahST. The install-wsi script does not currently offer the option to download and set up a Whisperfile.

Additionally, I'm aso unable to manually configure BlahST to use the whisper-tiny.llama file on my MX-Linux system (Debian with XFCE).

What am I missing? Is there a solution or a guide on how to properly configure the Whisperfile?

Thank you for your help.

QuantiusBenignus commented 1 month ago

Hi @kjoetom ,

I was considering setting up a whisperfile in the installation script, but haven't done it yet because it will require an interactive choice of language and model. It is worth doing it only if the download (whisper)files and location remain unchanged and not turn into a moving target as the llamafile repo evolves.

In principle, setting up a whisperfile should be simple as per the instructions on their website.You have two options, either download the standalone executable from here https://github.com/Mozilla-Ocho/llamafile/releases/download/0.8.13/whisperfile-0.8.13 and then use it with your model of choice, or download a whisperfile with a whisper model already included. (The script expects the second option and needs to be modified to work with a whisperfile without a builtin model.) After you have downloaded the whisperfile and placed it in your path, simply make it executable with

chmod +x whisper-XXXXXX.llamafile

Then, do not forget to specify the whisperfile in the config section of the BlahST script. Also, to use the whisperfile, the script must be invoked with the -w glag so, adjust your XFCE hotkey shortcuts appropriately. Let me know if you have further issues. I am testing and will be adding some nice new functionality to BlahST very soon, but it will require setup of external tools like this whisperfile so I hope you get it to work.

P.S. You can find some more whisperfiles with models in here https://huggingface.co/Mozilla/whisperfile/tree/main

One more thing. Make sure your whisperfile ends on ".llamafile" because that is what the script is looking for. For example in the config section of BlahST, it should say something like this:

 WHISPERFILE=whisper-tiny.en.llamafile
kjoetom commented 1 month ago

Hi @QuantiusBenignus,

Thank you for your quick response.

I wanted to use a Whisper file that already includes a Whisper model. Specifically, I wanted to use the Whisper file "whisper-tiny.llamafile" that I downloaded from https://huggingface.co/Mozilla/whisperfile/blob/main/whisper-tiny.llamafile.

Unfortunately, I was unable to get it working, despite following all your instructions.

I ran "install-wsi" and received the following prompt: "You can use a local whisper.cpp installation, or connect to a whisper.cpp server. Do you want to use a local whisper.cpp preinstalled on this machine? [Y/n]:

I'm not sure what to select here, since the Llama file starts a local server.

One problem was that under MX-Linux (Debian), adding $HOME/.local/bin to the $PATH didn't work correctly. I had to do it manually: I added the line export PATH="$HOME/.local/bin:$PATH" to the end of the ~/.bashrc file and then ran "source ~/.bashrc".

When I ran "wsiml -w", I got the following error message: bash: /home/user/.local/bin/wsiml: /usr/bin/zsh: bad interpreter: No such file or directory Using Bash, I changed the shebang line in the wsiml file from #!/usr/bin/zsh to #!/usr/bin/bash.

When I ran "wsiml -w" again, I got the following error message: Please, install whisper.cpp (see https://github.com/ggerganov/whisper.cpp) and create 'transcribe' in your PATH as a symbolic link to the main executable, e.g. 'ln -s /full/path/to/whisper.cpp/main $HOME/.local/bin/transcribe' I removed the "; exit 1" at this point to prevent the script from stopping.

Now after typing "wsiml -w" a small red microphone icon appeared. After a while, the terminal displayed "Using whisperfile:". But nothing happened.

I also tried starting the Llama file manually, which startsd a local server on 127.0.0.1:8080. I modified wsiml accordingly and started it this time with "wsiml -n". But this didn't work either.

I'm sorry that I can't provide more positive feedback.

kjoe

QuantiusBenignus commented 1 month ago

Hi @kjoetom ,

Looks like you are almost there. IF the script reaches "Using whisperfile" that means that sox (rec) has succeeded to create an audio file and terminate (Did you terminate it with the hotkey or did it terminate itself on silence? If your sound levels and noise are OK, both should work.) Now the wav file should be in /dev/shm/wfile you can try to hear it with play /dev/shm/wfile to verify the quality. Next step: Take the commandline (assuming an embedded model, otherwise -m flag and model needed) :

 whisperfile-XXXX.llamafile -nt -pc --gpu auto -f /dev/shm/wfile

and see if the produced text is well recognized speech.

If yes, then the same command should also work in the script (slightly modified there to prevent logging to stdout)

After that command in the script, the only other operations are some text formatting and placing the text into the clipboard (primary selection without -c flag), so it should work. It is checking to see if X11 or wayland is being used and uses either xsel or wl-copy for the purpose. So, go to any text field and press the middle mouse button to see if you can paste the transcribed text.

It is imperative to confirm that the whisperfile operates on wav files from the command line, so please, try with the above command. If not, then there is something happening with the whisperfile. (Remember that depending on the sound quality, language, model size, etc. the speech recognition remains a somewhat stohastic process and there is an nonzero WER). There are many, many flags in whisper.cpp (and the whisperfile) that can be used to address some speech cases. Do you mind sharing what language and what exact whisper model (inside the whisperfile) do you use? Oh, I see, the tiny model, may not be very good at languages other than English. Still, should work.

After a successful run from the command line with the various flags, it should work from anywhere in the desktop with the set hotkeys (and chosen flags). Also, the server code in the script should not care if it is whisper.cpp server or whisperfile server, as long as they expose the same API, which I think they do. So first verify the command line and then address the server query with -n flag. Let me know.

Q.B.

kjoetom commented 1 month ago

I think the problem in my setting initially lies in the fact that the command rec -q -t wav $ramf rate 16k silence 1 0.1 1% 1 2.0 5% 2>/dev/null is not working correctly. Only the first three characters are being recorded.

I'm not yet sure which parameters I need to adjust to make the recording work optimally. Of course, it would be great if the system could calibrate itself.

I tried it again with the following command from the terminal: rec -q -t wav /dev/shm/wfile rate 16k silence 1 0.1 1% 1 5.0 1% 2>/dev/null At the end, it was still cut off, but at least a large part of my sentence was recorded.

However, once the /dev/shm/wfile has a sufficient length, the llamafile has worked in the terminal. Unfortunately, I don't have time to test more at the moment. Nevertheless, thank you for pointing me in the right direction.

kjoe

QuantiusBenignus commented 1 month ago

Hi @kjoetom,

It seems your microphone levels are too low. Check your mixer settings to increase (amplify) the microphone sensitivity level or just bring it closer when speaking. Then, you will notice that the recording does not get interrupted and you can use the interrupt hotkey you set up. Makes everything much more stable and predictable. Please, feel free to close this issue if it is resolved.

BTW, check the new demo video with audio in the main page/README.md to see how BlahST operates in my system and to preview some upcoming features.

Cheers, QB

kjoetom commented 1 month ago

Hi @QuantiusBenignus,

I think the script does not yet fully support the option to use a lamafile (instead of Whisper.cpp) without further modification. Because even if there is an executable whisper.llamafile and the script is adapted accordingly and started with the -w option, it still aborts and gives an error message if there is no Whisper.cpp main and transcribe symlink pointing to it.

Thanks for your help. Regards, kjoe

QuantiusBenignus commented 1 month ago

Hi @kjoetom ,

You make a valid point. While using a native whisper.cpp (especially server) is the most performant option, if a user has sourced a whisperfile for convenience or other reason, they likely do not have or need a whisper.cpp installation.

Fixed in master. Even if there is a whisper.cpp installation, the check will not be performed unless there is no whisperfile. Let me know if you have any other ideas or concerns.

QB

kjoetom commented 1 month ago

Hi QB,

thank you very much for the quick fix I've tested the update and it now works with my whisper-xxxxxxx.llamafile. I really appreciate the effort you put into making this work.

I just wanted to mention that I had to make a few modifications to the wsiml script (in my case) to get it working on my system. Maybe this will help others who encounter similar issues:

After downloading the whisper-xxxxxxx.llamafile, I made it executable (chmod +x whisper-xxxxxxx.llamafile) and placed it in $HOME/.local/bin/.

Additionally, I had to manually add $HOME/.local/bin to my $PATH by adding the following line to the end of my ~/.bashrc file: export PATH="$HOME/.local/bin:$PATH" and then running the command source ~/.bashrc in the terminal.

I also had to install xsel separately.

Thanks again for your work and for responding to my feedback so quickly!

Best regards, kjoetom

QuantiusBenignus commented 1 month ago

My pleasure,

Thanks for the feedback. I am glad that this works in XFCE too. I have not tested it but assumed correctly it should since XFCE allows system-wide hotkeys. Most of the steps you did are regular actions most users of this tool would need to perform, installing dependencies adjusting environments etc. but this is the charm and power of Linux. If you like the new AI interaction (shown in the video in the main page), you can consider also downloading a llamafile (and piper TTS ) and setting up a "vocal assistant". I am sure that will have its own challenges but the results are quite impressive. I will now close this issue. Cheers,

QB