backspacetg / simul_whisper

Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
37 stars 3 forks source link

Test result is not satisfied #4

Open hmylk opened 6 days ago

hmylk commented 6 days ago

I had test one short wav and one long wav,the asr result is not satisfied , for example: image

can you explan it , or maybe i had something wrong?

backspacetg commented 2 days ago

I'm not really sure. Maybe Whisper doesn't output Chinese with Spaces as separators, and the word truncation model only keeps output up to the first space? If so, you can modify the boundary detection function (always return True) to remove the CIF model, or modify this to remove only the last Chinese character.

hmylk commented 2 days ago

thank you for your reply。 But ,My question is not the space , is the asr result is cycle of one word . that's not normal.

backspacetg commented 2 days ago

Will it be better when using a longer chunk? E.g. 2 or 3 senconds

hmylk commented 2 days ago

e... no, i have tried 2 or 3 , the result is not normal. Then i tried 10s, the wav file is about 7s , the result is just so so .