collabora / WhisperFusion

WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.
1.45k stars 101 forks source link

Indentation Bug in `` #25

Open DamianB-BitFlipper opened 5 months ago

DamianB-BitFlipper commented 5 months ago

In the file I suspect that the highlighted lines need to be in the same indentation level as the while loop. Otherwise, in its current form it makes no sense to me. Just shining some light on this.

makaveli10 commented 5 months ago

Not really, because we want to only send one response to the client, at some point we were sending all the responses we add to the llm_queue for all updates in the current segment from whisper-live but then we decided to send only the one which corresponds to the transcription with eos=True.

That said, this if could be at the same level as the outer if and everything should be fine.

DamianB-BitFlipper commented 5 months ago

Thanks for your reply! I understand the logic to only send those responses with eos. But could there not be a backlog in the llm_queue such that there are multiple sentences. Where the first one has an EOS and then begins the other with its own EOS. In the current implementation, the first sentence would be lost.

makaveli10 commented 5 months ago

@DamianB-BitFlipper not sure i understand what you mean when you say sentences, there are llm_response which could be multiple sentences or a single word.

In the current implementation, the first sentence would be lost.

Can you please give an example if you have seen this?

DamianB-BitFlipper commented 5 months ago

I wouldn't expect to see this in most cases in practice because the llm_response queue would empty rather quickly. I am just postulating, from exploring the code and poking at it, that the transcriber sends: [<first sentence, eos=True>, <second sentence here, eos=True>], the way the code is written, the first sentence is lost.

I am aware that the transcriber does not put eos=True at the end of sentences, but rather at prolonged pauses of non-voice input. I am using sentence here as an example purely.

makaveli10 commented 5 months ago

I am just postulating, from exploring the code and poking at it, that the transcriber sends: [<first sentence, eos=True>, <second sentence here, eos=True>], the way the code is written, the first sentence is lost.

@DamianB-BitFlipper Okay, so we should never reach this state for a short exchange conversation i.e. we transcribe until EOS=true and at that time the llm_queue should be

[{output1, eos=False}, ..., {outputn, eos=True}]

We only care about outputn at this moment, because that is the most recent llm_output corresponding to the most updated transcription. Not sure why you would want output1