Open Slepetys opened 1 week ago
Additional info:
When I change the settings to HostAPI: MME and Speaker to the default speaker, I got an error in the log which looks like to have a different origin, most likely the audio stream can be captured but not processed:
log
2024-09-18 14:11:11.824 | INFO | log.py:150 [MainThread] - Log cleared
2024-09-18 14:11:18.544 | DEBUG | record.py:383 [Thread-94 (set_meter)] - Opening Speaker meter
2024-09-18 14:11:18.591 | DEBUG | record.py:504 [Dummy-95] - Checking if webrtcvad is possible to use. You can ignore the error log if it fails!
2024-09-18 14:11:18.591 | DEBUG | record.py:506 [Dummy-95] - Checking if silero is possible to use. You can ignore the error log if it fails!
2024-09-18 14:11:18.592 | ERROR | record.py:518 [Dummy-95] - Input audio chunk is too short
Traceback (most recent call last):
File "D:\Codes\_Projects\Python\Speech-Translate\speech_translate\ui\frame\setting\record.py", line 507, in stream_cb
File "C:\Users\USERNAME\AppData\Local\Programs\Speech Translate\lib\speech_translate\assets\silero-vad\utils_vad.py", line 56, in __call__
x, sr = self._validate_input(x, sr)
│ │ │ │ └ 16000
│ │ │ └ tensor([ 0.0921, 0.1457, 0.1715, 0.2127, 0.2518, 0.2576, 0.2480, 0.2433,
│ │ │ 0.2564, 0.2626, 0.2543, 0.2321, ...
│ │ └ <function OnnxWrapper._validate_input at 0x000001F5EBECC220>
│ └ <utils_vad.OnnxWrapper object at 0x000001F5EBA59150>
└ tensor([ 0.0921, 0.1457, 0.1715, 0.2127, 0.2518, 0.2576, 0.2480, 0.2433,
0.2564, 0.2626, 0.2543, 0.2321, ...
File "C:\Users\USERNAME\AppData\Local\Programs\Speech Translate\lib\speech_translate\assets\silero-vad\utils_vad.py", line 44, in _validate_input
raise ValueError("Input audio chunk is too short")
ValueError: Input audio chunk is too short
2024-09-18 14:11:18.596 | ERROR | record.py:533 [Dummy-95] - SileroVAD Error!
2024-09-18 14:11:18.596 | WARNING | record.py:535 [Dummy-95] - Not possible to use Silero VAD with the current device config! So it is now disabled
Settings
Using pyaudiowpatch to find the loopback speaker I run the code below in order to find the default loopback speaker, which does not matches any of the options detected from Speech Translate.
import pyaudiowpatch as pyaudio
# Find default Microphone and Speakers:
p = pyaudio.PyAudio()
wasapi_info = p.get_host_api_info_by_type(pyaudio.paWASAPI)
default_speakers = p.get_device_info_by_index(wasapi_info["defaultOutputDevice"])
default_microphone = p.get_device_info_by_index(wasapi_info["defaultInputDevice"])
if not default_speakers["isLoopbackDevice"]:
for loopback in p.get_loopback_device_info_generator():
"""
Try to find loopback device with same name(and [Loopback suffix]).
Unfortunately, this is the most adequate way at the moment.
"""
if default_speakers["name"] in loopback["name"]:
default_speakers = loopback
break
else:
print("Default loopback output device not found.\n\nRun `python -m pyaudiowpatch` to check available devices.\nExiting...\n")
exit()
print(f"""
Input Microphone : {default_microphone['name']}
Index : {default_microphone['index']}
Input Channels : {default_microphone['maxInputChannels']}
Input Latency : {default_microphone['defaultLowInputLatency']} s
Input Latency(max): {default_microphone['defaultHighInputLatency']} s
Sample Rate : {default_microphone['defaultSampleRate']} Hz
""")
print(f"""
Loopback Speakers : {default_speakers['name']}
Index : {default_speakers['index']}
Channels : {default_speakers['maxInputChannels']}
Latency : {default_speakers['defaultLowInputLatency']} s
Latency(max) : {default_speakers['defaultHighInputLatency']} s
Sample Rate : {default_speakers['defaultSampleRate']} Hz
""")
resulting in:
Input Microphone : Echo Cancelling Speakerphone (Jabra SPEAK 510 USB)
Index : 17
Input Channels : 1
Input Latency : 0.003 s
Input Latency(max): 0.01 s
Sample Rate : 16000.0 Hz
Loopback Speakers : Echo Cancelling Speakerphone (Jabra SPEAK 510 USB) [Loopback]
Index : 20
Channels : 2
Latency : 0.003 s
Latency(max) : 0.01 s
Sample Rate : 48000.0 Hz
Speech Translate is not recording the speaker in Windows 11 Speech Translate is properly recording, transcribing and translating when the input is set to Microphone, but when I change to Speaker, it gives an error: -9999 Unanticipated host error and does not starts the recording.
To Reproduce Host API tried:
Speaker setting (all combinations):
Screenshots
Log 2024-09-18 12:06:42.920 | ERROR | record.py:944 [Thread-50 (record_session)] - [Errno -9999] Unanticipated host error Traceback (most recent call last):
File "D:\Codes_Projects\Python\Speech-Translate\speech_translate\utils\audio\record.py", line 651, in record_session
File "D:\Codes_Projects\Python\Speech-Translate.venv\Lib\site-packages\pyaudiowpatch__init__.py", line 801, in open
File "D:\Codes_Projects\Python\Speech-Translate.venv\Lib\site-packages\pyaudiowpatch__init.py", line 467, in init__
OSError: [Errno -9999] Unanticipated host error 2024-09-18 12:06:42.920 | ERROR | record.py:945 [Thread-50 (record_session)] - Error in record session
Desktop