Dadangdut33 / Speech-Translate

A realtime speech transcription and translation application using Whisper OpenAI and free translation API. Interface made using Tkinter. Code written fully in Python.
MIT License
462 stars 57 forks source link

[BUG] Not recording speaker in Windows #87

Open Slepetys opened 1 week ago

Slepetys commented 1 week ago

Speech Translate is not recording the speaker in Windows 11 Speech Translate is properly recording, transcribing and translating when the input is set to Microphone, but when I change to Speaker, it gives an error: -9999 Unanticipated host error and does not starts the recording.

To Reproduce Host API tried:

Speaker setting (all combinations):

Screenshots image

Log 2024-09-18 12:06:42.920 | ERROR | record.py:944 [Thread-50 (record_session)] - [Errno -9999] Unanticipated host error Traceback (most recent call last):

File "D:\Codes_Projects\Python\Speech-Translate\speech_translate\utils\audio\record.py", line 651, in record_session

File "D:\Codes_Projects\Python\Speech-Translate.venv\Lib\site-packages\pyaudiowpatch__init__.py", line 801, in open

File "D:\Codes_Projects\Python\Speech-Translate.venv\Lib\site-packages\pyaudiowpatch__init.py", line 467, in init__

OSError: [Errno -9999] Unanticipated host error 2024-09-18 12:06:42.920 | ERROR | record.py:945 [Thread-50 (record_session)] - Error in record session

Desktop

Slepetys commented 1 week ago

Additional info:

When I change the settings to HostAPI: MME and Speaker to the default speaker, I got an error in the log which looks like to have a different origin, most likely the audio stream can be captured but not processed:

log

2024-09-18 14:11:11.824 | INFO    | log.py:150 [MainThread] - Log cleared
2024-09-18 14:11:18.544 | DEBUG   | record.py:383 [Thread-94 (set_meter)] - Opening Speaker meter
2024-09-18 14:11:18.591 | DEBUG   | record.py:504 [Dummy-95] - Checking if webrtcvad is possible to use. You can ignore the error log if it fails!
2024-09-18 14:11:18.591 | DEBUG   | record.py:506 [Dummy-95] - Checking if silero is possible to use. You can ignore the error log if it fails!
2024-09-18 14:11:18.592 | ERROR   | record.py:518 [Dummy-95] - Input audio chunk is too short
Traceback (most recent call last):

  File "D:\Codes\_Projects\Python\Speech-Translate\speech_translate\ui\frame\setting\record.py", line 507, in stream_cb

  File "C:\Users\USERNAME\AppData\Local\Programs\Speech Translate\lib\speech_translate\assets\silero-vad\utils_vad.py", line 56, in __call__
    x, sr = self._validate_input(x, sr)
    │       │    │               │  └ 16000
    │       │    │               └ tensor([ 0.0921,  0.1457,  0.1715,  0.2127,  0.2518,  0.2576,  0.2480,  0.2433,
    │       │    │                          0.2564,  0.2626,  0.2543,  0.2321,  ...
    │       │    └ <function OnnxWrapper._validate_input at 0x000001F5EBECC220>
    │       └ <utils_vad.OnnxWrapper object at 0x000001F5EBA59150>
    └ tensor([ 0.0921,  0.1457,  0.1715,  0.2127,  0.2518,  0.2576,  0.2480,  0.2433,
               0.2564,  0.2626,  0.2543,  0.2321,  ...

  File "C:\Users\USERNAME\AppData\Local\Programs\Speech Translate\lib\speech_translate\assets\silero-vad\utils_vad.py", line 44, in _validate_input
    raise ValueError("Input audio chunk is too short")

ValueError: Input audio chunk is too short
2024-09-18 14:11:18.596 | ERROR   | record.py:533 [Dummy-95] - SileroVAD Error!
2024-09-18 14:11:18.596 | WARNING | record.py:535 [Dummy-95] - Not possible to use Silero VAD with the current device config! So it is now disabled

Settings image

Using pyaudiowpatch to find the loopback speaker I run the code below in order to find the default loopback speaker, which does not matches any of the options detected from Speech Translate.

import pyaudiowpatch as pyaudio
# Find default Microphone and Speakers:
p = pyaudio.PyAudio()
wasapi_info = p.get_host_api_info_by_type(pyaudio.paWASAPI)
default_speakers   = p.get_device_info_by_index(wasapi_info["defaultOutputDevice"])
default_microphone = p.get_device_info_by_index(wasapi_info["defaultInputDevice"])
if not default_speakers["isLoopbackDevice"]:
    for loopback in p.get_loopback_device_info_generator():
        """
        Try to find loopback device with same name(and [Loopback suffix]).
        Unfortunately, this is the most adequate way at the moment.
        """
        if default_speakers["name"] in loopback["name"]:
            default_speakers = loopback
            break
    else:
        print("Default loopback output device not found.\n\nRun `python -m pyaudiowpatch` to check available devices.\nExiting...\n")
        exit()

print(f"""
Input Microphone  : {default_microphone['name']}
Index             : {default_microphone['index']}
Input Channels    : {default_microphone['maxInputChannels']}
Input Latency     : {default_microphone['defaultLowInputLatency']} s
Input Latency(max): {default_microphone['defaultHighInputLatency']} s
Sample Rate       : {default_microphone['defaultSampleRate']} Hz
""")    

print(f"""
Loopback Speakers : {default_speakers['name']}
Index             : {default_speakers['index']}
Channels          : {default_speakers['maxInputChannels']}
Latency           : {default_speakers['defaultLowInputLatency']} s
Latency(max)      : {default_speakers['defaultHighInputLatency']} s
Sample Rate       : {default_speakers['defaultSampleRate']} Hz
""")

resulting in:

Input Microphone  : Echo Cancelling Speakerphone (Jabra SPEAK 510 USB)
Index             : 17
Input Channels    : 1
Input Latency     : 0.003 s
Input Latency(max): 0.01 s
Sample Rate       : 16000.0 Hz

Loopback Speakers : Echo Cancelling Speakerphone (Jabra SPEAK 510 USB) [Loopback]
Index             : 20
Channels          : 2
Latency           : 0.003 s
Latency(max)      : 0.01 s
Sample Rate       : 48000.0 Hz