Closed mlhdeep-ai closed 3 months ago
@mlhdeep-ai , hello.You can add option word_timestamps=True
to transcribe()
function:
model = WhisperModel('large-v3', device='cuda')
segments, info = model.transcribe(jfk_path, language="en", word_timestamps=True)
for segment in segments:
print("Sentence: [%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
for word in segment.words:
print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word))
Thanks for your reply. I have reviewed most of the documentation related to your point, but it didn't resolve the issue in my code.
When I omit the "word_timestamps" argument in the “transcribe” function, the transcription of the “Persian” audio file into “Persian” text is done correctly. However, when I include “word_timestamps = True” in the “transcribe” function, the "segments" value returns empty, and as a result, the transcription is not completed.
Here is my code:
model = WhisperModel(model_path, device, compute_type="float16", local_files_only=True)
segments, _ = model.transcribe(str(audio_path), language=language,
word_timestamps = True
)
segments = list(segments) # The transcription will actually run here.
print("\nsegments:")
print(segments)
print("\n")
for segment in segments:
text_start = segment.start
text_end = segment.end
transcription = segment.text
@mlhdeep-ai , you need to remove this logic:
segments = list(segments) # The transcription will actually run here.
=> segments
is a generator object. A generator can only be iterated once because it yields items one at a time and doesn’t store them in memory. When you convert a generator to a list, you exhaust it because the conversion iterates over all items.
Hi everyone...
In the ASR project for converting Persian audio to Persian text, we need to divide the audio into fixed chunks (e.g., 10 seconds). However, there is a problem: sometimes the audio is split in the middle of the last word, causing errors such as the last word being deleted or repeated.
To address this, we decided to capture the timestamp of each word in a chunk, cut the last word from the current chunk, and add it to the beginning of the next chunk. However, in the segment returned by the transcribe function from Faster Whisper, the value of the "words" field is None. As a result, we cannot access the timestamps of the words in each chunk.
We have two questions:
Thank you for your attention.