LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.34k stars 310 forks source link

Cannot initialize microphone (non-localhost URL) #920

Open CorentinWicht opened 2 weeks ago

CorentinWicht commented 2 weeks ago

Dear @LostRuins,

I am trying to link my koboldcpp server running on port 8008 with the Alltalk server running on 7851 (default), on the same virtual machine.

I have opened the corresponding firewall ports and can reach each service individually via the public ip address/alias: image image

I can successfully link them both in koboldcpp settings: image image

I have two issues here:

  1. Though I do not get an error, there is not sound coming out when I run the image or when the AI responses to a query.
  2. When I set image, I get the following error: image

For context, we are running an instance of koboldcpp on a linux RHEL9 virtual machine whose accessible to our University community inside our network only.

Best,

C.

LostRuins commented 2 weeks ago
  1. For the first item, can you check if there's audio being output if TTS is set to the browser built-in TTS instead? I'm not sure about AllTalk, but XTTS api server had a streaming mode based on launch flags, where the audio is output from the server side when streamed instead, so if that exists it should be turned off. You can take a look at the browser network tab to see if the audio is actually generated and received correctly. Perhaps @erew123 can take a look too.

  2. For Whisper speech control - This is a common issue but it has nothing to do with Koboldcpp, but rather, this is a modern browser security setting that prevents the use of microphone for any webpage content not served over https. Koboldcpp does allow custom or self-signed SSL certs which should solve this issue - alternatively you can use a service like Cloudflare to perform SSL offloading via a proxy. If you can only use http, then the browser itself needs to have a flag to allow it: See https://stackoverflow.com/questions/52759992/how-to-access-camera-and-microphone-in-chrome-without-https in order for a microphone to work.

The browser security restriction does not affect localhost which is why it works there even with http.

CorentinWicht commented 2 weeks ago
  1. For the first item, can you check if there's audio being output if TTS is set to the browser built-in TTS instead? I'm not sure about AllTalk, but XTTS api server had a streaming mode based on launch flags, where the audio is output from the server side when streamed instead, so if that exists it should be turned off. You can take a look at the browser network tab to see if the audio is actually generated and received correctly. Perhaps @erew123 can take a look too.

The browser security restriction does not affect localhost which is why it works there even with http.

Thanks for your support!

I confirm that if I set TTS to my browser built-in TTS system (Brave Browser), sound comes out properly: image

I unfortunately don't see a setting in Alltalk where I could de-/activate the streaming mode: image

Overall, Is there a wiki somewhere which guides user into connecting koboldcpp and alltalk? In fact, now I am not even able to link them both... image

Even though the URL is correct and the API is ready to accept requests: image

  1. For Whisper speech control - This is a common issue but it has nothing to do with Koboldcpp, but rather, this is a modern browser security setting that prevents the use of microphone for any webpage content not served over https. Koboldcpp does allow custom or self-signed SSL certs which should solve this issue - alternatively you can use a service like Cloudflare to perform SSL offloading via a proxy. If you can only use http, then the browser itself needs to have a flag to allow it: See https://stackoverflow.com/questions/52759992/how-to-access-camera-and-microphone-in-chrome-without-https in order for a microphone to work.

Regarding Whisper, I will investigate it further but we are in fact serving our webpage content over https using a load-balancer (Kemp).

LostRuins commented 2 weeks ago

It will probably be super slow, but you can try the colab launch script for alltalk which did work for me previously, see https://github.com/erew123/alltalk_tts/issues/235

Try the colab setup and copy the cloudflare tunnel URL as the alltalk endpoint - see if that works when you generate a short test voice. Unfortunately, I don't have any further documentation that can help, you'd have to check with @erew123 regarding alltalk.

erew123 commented 2 weeks ago

Hi @CorentinWicht & @LostRuins

I can confirm there is a complication on AllTalk v1 with getting the API to work over tunnelling. Which is why I'm on with V2, which resolves this issue (or should do). beta

I have been working on getting Google Colab working, which is 90% working now with one or two caveats e.g. RVC voices don't work yet on Colab. Colab was my first push before working on Docker or other systems as I knew it would be the hardest of the technical challenges, meaning, if I can get it working there, it should work on any other systems.

Saying all that though, there are some required changes needed to make the BETA work with Kobold as there are some changes to the API response detailed here and generally speaking the API suite has been expanded. I was hoping to have colab ready by now, but, there have also been a few fires to put on with the beta so, that's dug into my time.

Ill reply to you @LostRuins on the other ticket, as it would be handy for us to keep things in one place I can track.

Thanks

erew123 commented 2 weeks ago

Oh, and the Streaming endpoint and Standard generation endpoints are 2x different endpoints. So its not something you set within AllTalk, but you have to make a call to a different endpoint. And of course, because on AllTalk v2, some of the TTS engines will not support streaming, streaming may or may not be available, depending on the TTS engine you load in. (this information is reported back to the client system via the "currentsettings" api/endpoint on V2)

LostRuins commented 2 weeks ago

Did you manage to get this sorted?

CorentinWicht commented 1 week ago

Did you manage to get this sorted?

Thanks for your support @LostRuins and @erew123.

I have uninstalled Alltalk V1 and moved to the V2 beta version which works nicely and can be accessed through http (but not https). I can also link both koboldcpp and alltalk v2 without errors but no sound is coming out (while it works if I use my browser TTS).

I will investigate further and keep you updated.

CorentinWicht commented 1 week ago

Did you manage to get this sorted?

Thanks for your support @LostRuins and @erew123.

I have uninstalled Alltalk V1 and moved to the V2 beta version which works nicely and can be accessed through http (but not https). I can also link both koboldcpp and alltalk v2 without errors but no sound is coming out (while it works if I use my browser TTS).

I will investigate further and keep you updated.

Just a tiny improvement: sound comes out now properly (only in http though). My mistake was, when I installed the Alltalk V2 beta version, I download the "Piper" model and it seems that koboldcpp instead requires the XTTS V2 model.