Closed RichardQin1 closed 2 months ago
plese help!!! thanks
参见: https://github.com/EtienneAb3d/WhisperTimeSync https://github.com/jianfch/stable-ts?tab=readme-ov-file#alignment
First of all, thank you very much. After trying, I found that I cannot obtain accurate time for short sentence recognition. Is there a more accurate method
First of all, thank you very much. After trying, I found that I cannot obtain accurate time for short sentence recognition. Is there a more accurate method
The problem of accuracy is mainly dependent on Whisper itself. You may try with different versions of Whisper, with different sizes. Each may provide you with different results. In my own experiments, playing with parameters never really improve the precision.
You may also gain in precision by applying multiple kinds of processing, like noise filtering or voice compression. See: https://github.com/EtienneAb3d/WhisperHallu
@RichardQin1 , hello. You could enable option word_timestamps=True
to receive timestamps for each word of the output transcription. And of course the accuracy depends on the whisper model you are using.
model = WhisperModel(model_path)
segments, info = model.transcribe(audio_path, word_timestamps=True)
for segment in segments:
print("Sentence: [%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
for word in segment.words:
print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word))
It is known that the text is a segment of the audio
eg:
test.mp3 input(text,test.mp3) output:
How to obtain the start and end timestamps of each sentence