Open hmylk opened 6 days ago
I'm not really sure. Maybe Whisper doesn't output Chinese with Spaces as separators, and the word truncation model only keeps output up to the first space? If so, you can modify the boundary detection function (always return True) to remove the CIF model, or modify this to remove only the last Chinese character.
thank you for your reply。 But ,My question is not the space , is the asr result is cycle of one word . that's not normal.
Will it be better when using a longer chunk? E.g. 2 or 3 senconds
e... no, i have tried 2 or 3 , the result is not normal. Then i tried 10s, the wav file is about 7s , the result is just so so .
I had test one short wav and one long wav,the asr result is not satisfied , for example:
can you explan it , or maybe i had something wrong?