espressif / esp-sr

Speech recognition
Other
575 stars 106 forks source link

Question regarding to vad_process (AIS-1679) #114

Open mike-2020 opened 2 months ago

mike-2020 commented 2 months ago

Hello,

I understand that this function is used to detect speech in received audio. But when it returns VAD_SPEECH, does it means the current frame (the data input for the current call to this function) contain speech? or it means current frame along with a number of previous frames contains speech?

I'd like to record speech only. So, want to make sure when vad_process returns VAD_SPEECH, it is the right time to start the recording, and will not miss any speech audio.

sun-xiangyu commented 2 months ago

Your understanding is correct, but you need to pay attention to the performance of VAD. It cannot be 100% accurate. You should consider the status of previous frames to determine whether certain frames need to be ignored.