collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.
MIT License
2.06k stars 282 forks source link

Client() stuck on listening for microphone. #105

Open JohnMama12 opened 10 months ago

JohnMama12 commented 10 months ago

I am using, Eleven labs, Text Gen UI and it's API, Whisper Live for transcription, but it seems to constantly listening to my microphone and it never send the messages:

while True:
    user_message = client() # get the result of the transcription it's stays stuck here.

    history.append({"role": "user", "content": user_message})
    data = {
        "mode": "instruct",
        "stream": False,
        "messages": history
    }

    response = requests.post(url, headers=headers, json=data, verify=False)
    assistant_message = response.json()['choices'][0]['message']['content']
    history.append({"role": "assistant", "content": assistant_message})
    print(assistant_message)

Is there a way I can make it stop listening if there is silence after a certain amount of time?? Or something like that? Thanks

makaveli10 commented 10 months ago

@JohnMama12 did you test the whisper-live client & server alone, without eleven labs and text gen ui? if not please do and if yes please let us know what issues were there?

JohnMama12 commented 10 months ago

Yes, I've tried it separately and it works fine, I just need to eventually stop listening after 2 secs or so.

JohnMama12 commented 9 months ago

@makaveli10 ?

makaveli10 commented 9 months ago

@JohnMama12 we want to listen always if we want to transcribe everything, no? Not sure how your chatbot setup looks like but we have used it in a chatbot assistant setup here => https://github.com/collabora/WhisperFusion and WhisperLive works fine for us. If you can details the issue maybe we can understand the issue with listening always.

JohnMama12 commented 9 months ago

@JohnMama12 we want to listen always if we want to transcribe everything, no? Not sure how your chatbot setup looks like but we have used it in a chatbot assistant setup here => https://github.com/collabora/WhisperFusion and WhisperLive works fine for us. If you can details the issue maybe we can understand the issue with listening always.

Thanks, but do you know if it would be possible not to use the default chatbot? would like to implement the chatbot externally, as I currently have using a server.

JohnMama12 commented 9 months ago

@makaveli10 WhisperFusion looks cool, but don't think it will work for me. I don't have a dedicated GPU for this and I'm running windows. I don't need speech synthesizing. If you didn't understand my issue clearly, the issue is that yes it transcribes it nearly in real time, but I need it to eventually stop listening after a couple secs. As I can't have have a conversation with the chatbot as it never stops listening. Do you understand my issue now?

zoq commented 9 months ago

@JohnMama12 you could just stop sending data if you get the eos notification:

https://github.com/collabora/WhisperLive/blob/8c36768f7f043e559c32b587ee35df2625c0fc7e/whisper_live/server.py#L184

I would also suggest to run the different services in threads. We can provide a more clear example based on the eos message that I mentioned, if you need it.

We use the end of sentence detection (eos) for WhisperFusion as well, to trigger the LLM part.

JohnMama12 commented 9 months ago

@JohnMama12 you could just stop sending data if you get the eos notification:

https://github.com/collabora/WhisperLive/blob/8c36768f7f043e559c32b587ee35df2625c0fc7e/whisper_live/server.py#L184

I would also suggest to run the different services in threads. We can provide a more clear example based on the eos message that I mentioned, if you need it.

We use the end of sentence detection (eos) for WhisperFusion as well, to trigger the LLM part.

Thanks, a step forward but it still doesn't do what i want. I added a small delay after client() and then put self.clients[websocket].set_eos(True) It seems to be listening once, but itstops listening right after the last thing I say but still it doesn't go to the next piece of code and send the message...

makaveli10 commented 9 months ago

@JohnMama12 If you follow WhisperFusion you should be able to setup WhisperLive with an LLM, we use EOS signal based on VAD(Voice Activity Detection) for turn taking in a conversation, when eos=true thats when trigger the LLM to process and send the output to the client. If you can share your client and server maybe we can assist you better.