KoljaB / RealtimeSTT

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
MIT License
2.1k stars 190 forks source link

Noise reduction/Sensitivity #95

Open zbeb opened 3 months ago

zbeb commented 3 months ago

Hi,

I’m currently using RealtimeSTT with the following configuration:

recorder_config = {
    'spinner': False,
    'model': 'large-v2',
    'language': 'en',
    'silero_sensitivity': 0.3,
    'webrtc_sensitivity': 1,
    'post_speech_silence_duration': 0.6,
    'min_length_of_recording': 0,
    'min_gap_between_recordings': 0,
    'enable_realtime_transcription': True,
    'realtime_processing_pause': 0.2,
    'realtime_model_type': 'tiny.en',
    'on_realtime_transcription_update': text_detected,
    'wakeword_backend': "oww",
    'wake_words_sensitivity': 0.35,
    'openwakeword_model_paths': "hey_echo.onnx",
    'wake_word_buffer_duration': 0.5,
}

The issue I'm facing is that the system continues to listen and doesn't stop when I finish talking if I keep clapping, snapping my fingers, or breathing heavily. It only stops when I stop making these noises.

Could you please advise on how to adjust the settings or configuration to resolve this issue?

Thank you!

Best regards, zbeb

KoljaB commented 3 months ago

Please try raising webrtc_sensitivity to 3

zbeb commented 3 months ago

Please try raising webrtc_sensitivity to 3

Hello, sorry for the late response, I tried that and it didn't work

KoljaB commented 3 months ago

Nothing I can do then. I combine both Webrtc and Silero VAD for start of speech, but can only rely on Webrtc for end of speech because it's lightweight and running in parallel to the real-time transcription.

zbeb commented 3 months ago

I see, do you have any other suggestions like adding a noise suppressor for the listening mic or something of the sort?

KoljaB commented 3 months ago

Definitely something i am thinking about. I could not find a reliable solution yet

KoljaB commented 3 months ago

Maybe I can work something out that still double checks with silero. Might cause probs on slow systems but i see that we need a better solution here

zbeb commented 3 months ago

Sounds good, hope something comes up to mind, thanks for the hard work really a great tool nevertheless

KoljaB commented 3 months ago

v0.2.2 now has a new parameter silero_deactivity_detection that can be set to True, could you please try again with the new version?

zbeb commented 3 months ago

Hello, it works perfectly, great work, I can see a spike in the GPU usage when ending the speech (as expected). Keep up the great work!