if name... in run.py shows arguments, but that's not fit for an explanation or guide for a user.
model transcription doesn't seem to keep up with my speech.
transcription output doesn't come as quickly as i'd like.
is it the sliding_window's fault?
Is it chunk_length_s's fault?
is it sample_rate's fault?
need to add logging:
application start up
chunk processed, outputting
audio input source detected. {source}
Codeium's explanation of transcribe_microphone:
"""
This class, RealTimeASR, handles real-time Automatic Speech Recognition (ASR) using the Transformers library. Here's a summary of each class method:
init(): Initializes the class attributes, including the ASR pipeline, transcription cache, sliding window, and sample rate.
initialize_audio(): Opens an audio stream for capturing real-time audio data using PyAudio.
capture_and_transcribe(): Captures audio from the microphone, transcribes it, and manages the sliding window and transcription cache.
transcribe_audio(): Transcribes a chunk of audio data using the ASR pipeline.
handle_transcription(): Handles the transcription result by appending it to the transcription cache and optionally logging it to a file.
is_log_file_writable(): Checks if the specified log file is writable.
write_to_log(): Writes text to a log file, creating a new log file if the current one is larger than 1MB.
write_transcription_cache_to_log(): Writes transcriptions from the transcription cache to the log file.
close_stream(): Stops the audio stream and closes it, as well as writes any remaining transcriptions to the log file if specified.
"""
Copy of
notes.txt
from branch1.1.31
:"""
audio input source is unclear
use Whisper v3
add usage documentation
run.py
shows arguments, but that's not fit for an explanation or guide for a user.model transcription doesn't seem to keep up with my speech.
transcription output doesn't come as quickly as i'd like.
sliding_window
's fault?chunk_length_s
's fault?sample_rate
's fault?need to add logging:
Codeium's explanation of
transcribe_microphone
: """ This class, RealTimeASR, handles real-time Automatic Speech Recognition (ASR) using the Transformers library. Here's a summary of each class method:init(): Initializes the class attributes, including the ASR pipeline, transcription cache, sliding window, and sample rate. initialize_audio(): Opens an audio stream for capturing real-time audio data using PyAudio. capture_and_transcribe(): Captures audio from the microphone, transcribes it, and manages the sliding window and transcription cache. transcribe_audio(): Transcribes a chunk of audio data using the ASR pipeline. handle_transcription(): Handles the transcription result by appending it to the transcription cache and optionally logging it to a file. is_log_file_writable(): Checks if the specified log file is writable. write_to_log(): Writes text to a log file, creating a new log file if the current one is larger than 1MB. write_transcription_cache_to_log(): Writes transcriptions from the transcription cache to the log file. close_stream(): Stops the audio stream and closes it, as well as writes any remaining transcriptions to the log file if specified. """
"""