PeoplePlusAI / sunva-ai

SUNVA AI: Seamless conversation loop for the deaf
13 stars 3 forks source link

STT groq audio buffer needs to be in m4a format #42

Closed maximaminima closed 2 months ago

maximaminima commented 2 months ago

Need this perhaps for streaming transcription as per groq docs. m4a is better for streaming apparently than wav.

filename = os.path.dirname(__file__) + "/audio.m4a"

with open(filename, "rb") as file:
    transcription = client.audio.transcriptions.create(
      file=(filename, file.read()),
      model="whisper-large-v3",
      response_format="verbose_json",
    )
maximaminima commented 2 months ago

Tracking here test_stt_transcription_loop with few tests to compare m4a and wav audio streams.

maximaminima commented 2 months ago

Directly reading audio buffer reduces the latency of processing by skipping file conversion.

Think about how to remove silences - next step.

gksoriginals commented 2 months ago

@bsbarkur I added is_silent logic to this branch. Tested and latency is not affected much by adding is_slient. Please test it and revert the commit incase of any issues.

gksoriginals commented 2 months ago

Closing this issue