ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
35.38k stars 3.61k forks source link

When transcribing Chinese audio, using whisper_full_get_segment_text can return the correct text, but using whisper_full_get_token_text might result in NULL. #2114

Open ppcfan opened 6 months ago

ppcfan commented 6 months ago

I encountered an issue while transcribing Chinese audio. After transcribing a segment of Chinese audio with whisper_full(...), I can obtain the correct Chinese text using whisper_full_get_segment_text. However, when I iterate over each token and call whisper_full_get_token_text, some tokens return NULL. I suspect this might be due to a single Chinese character corresponding to multiple tokens. If this is the case, how does whisper_full_get_segment_text map multiple tokens to a single Chinese character? Is there a method I can use to merge tokens and then output the correct token text? Thank you.