Closed liaoweiguo closed 3 days ago
seems algo_options.audio_chunk_duration means the time of pause, not the total audio, I'm not sure
how to make it less sensitive to detect a speech, I set speech_threshold=0.2, still get a lot of empty input
Hi @liaoweiguo - I prepared some docs on this: https://freddyaboulton.github.io/gradio-webrtc/advanced-configuration/#reply-on-pause-voice-activity-detection
audio_chunk_duration
is the chunk size used to run VAD. By default its 0.6 seconds. So if you set speech_threshold=0.2
, it means that if a chunk has less than 33.33% of voice activity (0.2/0.6) it will be a pause. I will set it to be lower for your case. Also you can try setting started_speaking_threshold
to be higher?
this seems better:
fn=ReplyOnPause(
response, output_sample_rate=OUT_RATE, output_frame_size=480, algo_options=AlgoOptions(audio_chunk_duration=0.6,started_talking_threshold=0.3,speech_threshold=0.2),
),
Nice!
I cannot find the logic for detect user pause.
BTW, I want to control the period between user pause, default setting seems too sensitive