Closed davidvonthenen closed 5 months ago
Hi @dvonthenen ,
I've noticed that the recent changes involving the flush feature for speech-to-text are not reflected in the SDK. I'm currently using deepgram-sdk 3.2.7 and have not seen the expected functionality (Finalize). Could you please provide some guidance on this or a timeline for when these changes might be integrated into the SDK?
Thank you!
hi @saleshwaram
It's in the queue.
You don't need to wait for this to be implemented in the SDK. You can use this right now. You can send the following message in the send()
function:
{ "type": "Finalize" }
Hi @dvonthenen,
It seems there is some confusion regarding the functionality of the "Finalize" type in the send() function, as my implementation is not receiving the expected final transcription when using this feature. Specifically, I am trying to address an edge case where I do not receive speech_final as true after finishing speaking. To handle this, I'm attempting to send a "Finalize" payload when no interim transcript is coming every 2 seconds, with the expectation that it will provide a finalized transcript up to that point. Below, I am including the relevant code snippets, the output I'm receiving, and the output I expect. Could you please clarify how the flush feature should work in this context? Are there any specific implementation details that might be missing or need to be adjusted in my code?
Thank you for your help!
import datetime
import threading
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions
from dotenv import load_dotenv
import json
load_dotenv()
class DeepgramSTT:
def __init__(self):
self.full_transcription = ""
self.final_transcription = ""
self.other_text = ""
self.transcript_ready = threading.Event()
self.connection_status = False
self.deepgram = DeepgramClient()
self.connection = self.deepgram.listen.live.v("1")
self.setup_events()
self.timer = None
def setup_events(self):
self.connection.on(LiveTranscriptionEvents.Open, self.on_open)
self.connection.on(LiveTranscriptionEvents.Close, self.on_close)
self.connection.on(LiveTranscriptionEvents.Transcript, self.on_message)
self.connection.on(LiveTranscriptionEvents.SpeechStarted, self.on_speech_started)
self.connection.on(LiveTranscriptionEvents.Metadata, self.on_metadata)
def on_open(self, *args, **kwargs):
self.connection_status = True
print("Connection opened")
def on_speech_started(self, x, speech_started, **kwargs):
print("Speech started")
def on_metadata(self, x, metadata, **kwargs):
print(f"\n\n{metadata}\n\n")
def on_close(self, *args, **kwargs):
self.connection_status = False
print("Connection closed")
def on_message(self, x, result, **kwargs):
sentence = result.channel.alternatives[0].transcript
# print(f"{datetime.datetime.now()}: {result.is_final}: {result.speech_final} {sentence}")
# Reset the timer whenever a new sentence is received
if len(sentence) == 0:
return
if result.is_final and result.speech_final:
self.final_transcription = self.full_transcription + sentence
if self.final_transcription!="":
self.transcript_ready.set()
return
else:
print("final")
return
elif result.is_final and not result.speech_final:
self.reset_timer()
self.full_transcription += sentence + " "
return
else:
self.reset_timer()
self.other_text = sentence
print("Interim sentence: ", sentence)
def reset_timer(self):
if self.timer and self.other_text!="":
self.timer.cancel()
self.timer = threading.Timer(2.0, self.send_finalize)
self.timer.start()
def send_finalize(self):
self.connection.send(json.dumps({"type": "Finalize"}))
print("Finalize sent due to 2 seconds of silence")
def start_connection(self):
options = LiveOptions(
model="nova-2",
language="en-US",
punctuate=True,
encoding="linear16",
channels=1,
sample_rate=16000,
vad_events=True,
endpointing=300,
interim_results=True,
utterance_end_ms="1000",
)
if not self.connection.start(options):
print("Failed to start connection")
return False
return True
def send_audio_data(self, data):
self.connection.send(data)
def finish(self):
if self.timer:
self.timer.cancel()
self.connection.finish()
print("Finished")
self.print_final_transcript()
def print_final_transcript(self):
print("Complete final transcript:")
print(self.full_transcription)
def is_connection_active(self):
return self.connection_status
from deepgramstt import DeepgramSTT
from datetime import datetime
import threading
import pyaudio
def main():
# Audio stream configuration
FORMAT = pyaudio.paInt16
CHANNELS = 1
SAMPLE_RATE = 16000
FRAMES_PER_BUFFER = 3200
# Initialize PyAudio
p = pyaudio.PyAudio()
try:
stream = p.open(format=FORMAT, channels=CHANNELS, rate=SAMPLE_RATE, input=True, frames_per_buffer=FRAMES_PER_BUFFER)
except IOError as e:
print(f"Could not open audio stream: {e}")
p.terminate()
return
# Initialize DeepgramSTT
dg_connection = DeepgramSTT()
if not dg_connection.start_connection():
print("Failed to start Deepgram connection")
stream.stop_stream()
stream.close()
p.terminate()
return
print("Connection started. Begin speaking now.")
# Start the audio stream thread immediately
exit_flag = False
def audio_stream_thread():
try:
while not exit_flag and dg_connection.is_connection_active():
try:
data = stream.read(FRAMES_PER_BUFFER, exception_on_overflow=False)
except IOError as e:
print(f"Error reading audio data: {e}")
break # Exit the loop if we can't read the data
dg_connection.send_audio_data(data)
if dg_connection.transcript_ready.is_set(): # Non-blocking check for the event
print(f"final: {dg_connection.final_transcription}\ttime: {datetime.utcnow().isoformat(timespec='milliseconds') + 'Z'}")
dg_connection.final_transcription = ""
dg_connection.transcript_ready.clear() # Reset the event
except Exception as e:
print(f"Unexpected error: {e}")
finally:
stream.stop_stream()
stream.close()
p.terminate()
dg_connection.finish()
audio_thread = threading.Thread(target=audio_stream_thread)
audio_thread.start()
input("Press Enter to stop recording...\n")
exit_flag = True
audio_thread.join()
print("Finished recording and processing.")
if __name__ == "__main__":
main()
$ python -m test
Connection opened
Connection started. Begin speaking now.
Press Enter to stop recording...
Speech started
Interim sentence: Early one morning,
Interim sentence: Early one morning, while the sun was
Interim sentence: Early one morning, while the sun was just
Interim sentence: Early one morning, while the sun was just starting to rise, a
Interim sentence: Early one morning, while the sun was just starting to rise, a young and energetic dog
Speech started
Interim sentence: excitedly ran around
Interim sentence: excitedly ran around the park. Juncker gave a
Interim sentence: excitedly ran around the park, jumping over small bushes, and chasing
Interim sentence: excitedly ran around the park, jumping over small bushes and chasing after brightly colored
Speech started
Interim sentence: a group of children
Interim sentence: a group of children laughed and played nearby
Interim sentence: a group of children laughed and played nearby, enjoying the
Interim sentence: a group of children laughed and played nearby, enjoying the warm weather and the free
Speech started
Interim sentence: before school started.
final: Early one morning, while the sun was just starting to rise, a young and energetic dog excitedly ran around the park, jumping over small bushes and chasing after brightly colored butterflies a group of children laughed and played nearby, enjoying the warm weather and the freedom of being outside before school started. time: 2024-05-22T11:44:15.713Z
Finalize sent due to 2 seconds of silence
Speech started
Connection closed
{
"type": "Metadata",
"transaction_key": "deprecated",
"request_id": "457e4f1b-e9a7-4e99-a704-d2f0f045d00a",
"sha256": "f91f59bcb63d46d4ea6e3a9b647d65e940d83373d9f929f71ff32940342c578e",
"created": "2024-05-22T11:43:53.803Z",
"duration": 23.6,
"channels": 1,
"models": [
"1dbdfb4d-85b2-4659-9831-16b3c76229aa"
],
"model_info": {
"1dbdfb4d-85b2-4659-9831-16b3c76229aa": {
"name": "2-general-nova",
"version": "2024-01-11.36317",
"arch": "nova-2"
}
}
}
Finished
Another output:
$ python -m test
Connection opened
Connection started. Begin speaking now.
Press Enter to stop recording...
Speech started
Interim sentence: Early one more
Interim sentence: Early one morning, while the sun was just
Interim sentence: Early one morning, while the sun was just starting to rise, a
Interim sentence: Early one morning, while the sun was just starting to rise, a young and energetic
Speech started
Interim sentence: a young and energetic dog excited
Interim sentence: a young and energetic dog excitedly ran around the path
Interim sentence: a young and energetic dog excitedly ran around the park jumping over small
Speech started
Interim sentence: and chasing after prey
Interim sentence: and chasing up brightly colored butterflies.
Interim sentence: and chasing up brightly colored butterflies as a group of children
Interim sentence: and chasing after brightly colored butterflies as a group of children laughed and played near
Speech started
Interim sentence: and played nearby, enjoying the
Interim sentence: and played nearby, enjoying the warm weather and the free
Interim sentence: and played nearby, enjoying the warm weather and the freedom of being outside
Interim sentence: and played nearby, enjoying the warm weather and the freedom of being outside before school started.
Speech started
Finalize sent due to 2 seconds of silence
Connection closed
{
"type": "Metadata",
"transaction_key": "deprecated",
"request_id": "d3380ca8-175b-470f-b514-84f4199b5baa",
"sha256": "798f63a6df80a3ae1bee4548708d5ea0190e5508e4d357debe807402cf944e31",
"created": "2024-05-22T11:43:07.732Z",
"duration": 25.4,
"channels": 1,
"models": [
"1dbdfb4d-85b2-4659-9831-16b3c76229aa"
],
"model_info": {
"1dbdfb4d-85b2-4659-9831-16b3c76229aa": {
"name": "2-general-nova",
"version": "2024-01-11.36317",
"arch": "nova-2"
}
}
}
Finished
$ python -m test
Connection opened
Connection started. Begin speaking now.
Press Enter to stop recording...
Speech started
Interim sentence: Early one more
Interim sentence: Early one morning, while the sun was just
Interim sentence: Early one morning, while the sun was just starting to rise, a
Interim sentence: Early one morning, while the sun was just starting to rise, a young and energetic
Speech started
Interim sentence: a young and energetic dog excited
Interim sentence: a young and energetic dog excitedly ran around the path
Interim sentence: a young and energetic dog excitedly ran around the park jumping over small
Speech started
Interim sentence: and chasing after prey
Interim sentence: and chasing up brightly colored butterflies.
Interim sentence: and chasing up brightly colored butterflies as a group of children
Interim sentence: and chasing after brightly colored butterflies as a group of children laughed and played near
Speech started
Interim sentence: and played nearby, enjoying the
Interim sentence: and played nearby, enjoying the warm weather and the free
Interim sentence: and played nearby, enjoying the warm weather and the freedom of being outside
Interim sentence: and played nearby, enjoying the warm weather and the freedom of being outside before school started.
Speech started
Finalize sent due to 2 seconds of silence
final: Early one morning, while the sun was just starting to rise, a young and energetic dog excitedly ran around the park, jumping over small bushes and chasing after brightly colored butterflies a group of children laughed and played nearby, enjoying the warm weather and the freedom of being outside before school started. time: 2024-05-22T11:44:15.713Z
Connection closed
{
"type": "Metadata",
"transaction_key": "deprecated",
"request_id": "d3380ca8-175b-470f-b514-84f4199b5baa",
"sha256": "798f63a6df80a3ae1bee4548708d5ea0190e5508e4d357debe807402cf944e31",
"created": "2024-05-22T11:43:07.732Z",
"duration": 25.4,
"channels": 1,
"models": [
"1dbdfb4d-85b2-4659-9831-16b3c76229aa"
],
"model_info": {
"1dbdfb4d-85b2-4659-9831-16b3c76229aa": {
"name": "2-general-nova",
"version": "2024-01-11.36317",
"arch": "nova-2"
}
}
}
Finished
If I understand the output correctly, Tthe first example I wouldn't expect anything to happen since the final:
happened just before the flush.
The second doesn't seem right, but I haven't experimented with the feature much. There are people using this in production, so it seems like there might be an issue in your code.
In the first example, the 'final' transcript is received and then 'finalize' is sent. Since the final transcript has already been received, I am not expecting anything further.
However, in the second transcript implementation, I have tried it a couple of times but it never produced any final response. If you could provide a working sample, I could test it on my side because I don't see any issue on my code side.
Apparently, I partially implemented this: https://github.com/deepgram/deepgram-python-sdk/pull/396
Going to reproduce what I did in the Go SDK now: https://github.com/deepgram/deepgram-go-sdk/pull/237
This is available in the latest release: https://github.com/deepgram/deepgram-python-sdk/releases/tag/v3.3.0
Proposed changes
Context
Possible Implementation
Other information