[BUG] Not recording speaker in Windows

Speech Translate is not recording the speaker in Windows 11 Speech Translate is properly recording, transcribing and translating when the input is set to Microphone, but when I change to Speaker, it gives an error: -9999 Unanticipated host error and does not starts the recording.

To Reproduce Host API tried:

MME
Windows Direct Sound
Windows WASAPI

Speaker setting (all combinations):

ID 0,4 Microsoft Mapper output
ID 0,5 Echo cancelling speakerphone (Jabra device)

Screenshots

Log 2024-09-18 12:06:42.920 | ERROR | record.py:944 [Thread-50 (record_session)] - [Errno -9999] Unanticipated host error Traceback (most recent call last):

File "D:\Codes_Projects\Python\Speech-Translate\speech_translate\utils\audio\record.py", line 651, in record_session

File "D:\Codes_Projects\Python\Speech-Translate.venv\Lib\site-packages\pyaudiowpatch__init__.py", line 801, in open

File "D:\Codes_Projects\Python\Speech-Translate.venv\Lib\site-packages\pyaudiowpatch__init.py", line 467, in init__

OSError: [Errno -9999] Unanticipated host error 2024-09-18 12:06:42.920 | ERROR | record.py:945 [Thread-50 (record_session)] - Error in record session

Desktop

OS: Windows 10
App Installation version: prebuilt CUDA version 1.3.10
App / Python version: 3.11

Additional info:

When I change the settings to HostAPI: MME and Speaker to the default speaker, I got an error in the log which looks like to have a different origin, most likely the audio stream can be captured but not processed:

log

2024-09-18 14:11:11.824 | INFO    | log.py:150 [MainThread] - Log cleared
2024-09-18 14:11:18.544 | DEBUG   | record.py:383 [Thread-94 (set_meter)] - Opening Speaker meter
2024-09-18 14:11:18.591 | DEBUG   | record.py:504 [Dummy-95] - Checking if webrtcvad is possible to use. You can ignore the error log if it fails!
2024-09-18 14:11:18.591 | DEBUG   | record.py:506 [Dummy-95] - Checking if silero is possible to use. You can ignore the error log if it fails!
2024-09-18 14:11:18.592 | ERROR   | record.py:518 [Dummy-95] - Input audio chunk is too short
Traceback (most recent call last):

  File "D:\Codes\_Projects\Python\Speech-Translate\speech_translate\ui\frame\setting\record.py", line 507, in stream_cb

  File "C:\Users\USERNAME\AppData\Local\Programs\Speech Translate\lib\speech_translate\assets\silero-vad\utils_vad.py", line 56, in __call__
    x, sr = self._validate_input(x, sr)
    │       │    │               │  └ 16000
    │       │    │               └ tensor([ 0.0921,  0.1457,  0.1715,  0.2127,  0.2518,  0.2576,  0.2480,  0.2433,
    │       │    │                          0.2564,  0.2626,  0.2543,  0.2321,  ...
    │       │    └ <function OnnxWrapper._validate_input at 0x000001F5EBECC220>
    │       └ <utils_vad.OnnxWrapper object at 0x000001F5EBA59150>
    └ tensor([ 0.0921,  0.1457,  0.1715,  0.2127,  0.2518,  0.2576,  0.2480,  0.2433,
               0.2564,  0.2626,  0.2543,  0.2321,  ...

  File "C:\Users\USERNAME\AppData\Local\Programs\Speech Translate\lib\speech_translate\assets\silero-vad\utils_vad.py", line 44, in _validate_input
    raise ValueError("Input audio chunk is too short")

ValueError: Input audio chunk is too short
2024-09-18 14:11:18.596 | ERROR   | record.py:533 [Dummy-95] - SileroVAD Error!
2024-09-18 14:11:18.596 | WARNING | record.py:535 [Dummy-95] - Not possible to use Silero VAD with the current device config! So it is now disabled

Settings

Using pyaudiowpatch to find the loopback speaker I run the code below in order to find the default loopback speaker, which does not matches any of the options detected from Speech Translate.

import pyaudiowpatch as pyaudio
# Find default Microphone and Speakers:
p = pyaudio.PyAudio()
wasapi_info = p.get_host_api_info_by_type(pyaudio.paWASAPI)
default_speakers   = p.get_device_info_by_index(wasapi_info["defaultOutputDevice"])
default_microphone = p.get_device_info_by_index(wasapi_info["defaultInputDevice"])
if not default_speakers["isLoopbackDevice"]:
    for loopback in p.get_loopback_device_info_generator():
        """
        Try to find loopback device with same name(and [Loopback suffix]).
        Unfortunately, this is the most adequate way at the moment.
        """
        if default_speakers["name"] in loopback["name"]:
            default_speakers = loopback
            break
    else:
        print("Default loopback output device not found.\n\nRun `python -m pyaudiowpatch` to check available devices.\nExiting...\n")
        exit()

print(f"""
Input Microphone  : {default_microphone['name']}
Index             : {default_microphone['index']}
Input Channels    : {default_microphone['maxInputChannels']}
Input Latency     : {default_microphone['defaultLowInputLatency']} s
Input Latency(max): {default_microphone['defaultHighInputLatency']} s
Sample Rate       : {default_microphone['defaultSampleRate']} Hz
""")    

print(f"""
Loopback Speakers : {default_speakers['name']}
Index             : {default_speakers['index']}
Channels          : {default_speakers['maxInputChannels']}
Latency           : {default_speakers['defaultLowInputLatency']} s
Latency(max)      : {default_speakers['defaultHighInputLatency']} s
Sample Rate       : {default_speakers['defaultSampleRate']} Hz
""")

resulting in:

Input Microphone  : Echo Cancelling Speakerphone (Jabra SPEAK 510 USB)
Index             : 17
Input Channels    : 1
Input Latency     : 0.003 s
Input Latency(max): 0.01 s
Sample Rate       : 16000.0 Hz

Loopback Speakers : Echo Cancelling Speakerphone (Jabra SPEAK 510 USB) [Loopback]
Index             : 20
Channels          : 2
Latency           : 0.003 s
Latency(max)      : 0.01 s
Sample Rate       : 48000.0 Hz

Dadangdut33 / Speech-Translate

[BUG] Not recording speaker in Windows #87