Open SaddamBInSyed opened 4 months ago
Can you pls try adding
import multiprocessing
multiprocessing.set_start_method('spawn', force=True)
at the start of the script?
HI @KoljaB
I tried adding the above line and this time script was running but could not do transcription.
ffmpeg --version ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0) configuration: --prefix=/opt/conda/conda-bld/ffmpeg_1597178665428/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh --cc=/opt/conda/conda-bld/ffmpeg_1597178665428/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame libavutil 56. 51.100 / 56. 51.100 libavcodec 58. 91.100 / 58. 91.100 libavformat 58. 45.100 / 58. 45.100 libavdevice 58. 10.100 / 58. 10.100 libavfilter 7. 85.100 / 7. 85.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 7.100 / 5. 7.100 libswresample 3. 7.100 / 3. 7.100 Unrecognized option '-version'. Error splitting the argument list: Option not found
/home/mypc/miniconda3/envs/VoiceAgent/bin/python /home/mypc/Downloads/LocalAIVoiceChat-main/ai_voicetalk_local.py try to import llama_cpp_cuda llama_cpp_cuda import failed llama_cpp_lib: return llama_cpp Initializing LLM llama.cpp model ... llama.cpp model initialized Initializing TTS CoquiEngine ...
Using model: xtts Initializing STT AudioToTextRecorder ... ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card' ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card' ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card' ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card' Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2048 Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2718 Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2842 Process Process-3: Traceback (most recent call last): File "/home/mypc/miniconda3/envs/VoiceAgent/lib/python3.10/site-packages/RealtimeSTT/audio_recorder.py", line 700, in _audio_data_worker stream = audio_interface.open( File "/home/mypc/miniconda3/envs/VoiceAgent/lib/python3.10/site-packages/pyaudio/init.py", line 639, in open stream = PyAudio.Stream(self, *args, kwargs) File "/home/mypc/miniconda3/envs/VoiceAgent/lib/python3.10/site-packages/pyaudio/init.py", line 441, in init self._stream = pa.open(arguments) OSError: [Errno -9997] Invalid sample rate
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/mypc/miniconda3/envs/VoiceAgent/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/mypc/miniconda3/envs/VoiceAgent/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, self._kwargs) File "/home/mypc/miniconda3/envs/VoiceAgent/lib/python3.10/site-packages/RealtimeSTT/audio_recorder.py", line 710, in _audio_data_worker logging.exception("Error initializing pyaudio " File "/home/mypc/miniconda3/envs/VoiceAgent/lib/python3.10/logging/init.py", line 2113, in exception error(msg, *args, exc_info=exc_info, *kwargs) File "/home/mypc/miniconda3/envs/VoiceAgent/lib/python3.10/logging/init.py", line 2105, in error root.error(msg, args, kwargs) File "/home/mypc/miniconda3/envs/VoiceAgent/lib/python3.10/logging/init.py", line 1506, in error self._log(ERROR, msg, args, **kwargs) TypeError: Log._log() got an unexpected keyword argument 'exc_info'
Select voice (1-5): 1 Opening stream This is how voice number 1 sounds like XTTS Synthesizing: This is how voice number 1 sounds like /home/mypc/miniconda3/envs/VoiceAgent/lib/python3.10/site-packages/TTS/tts/layers/xtts/stream_generator.py:138: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation) warnings.warn( Accept voice (y/n): y Scenario: As Lina you are a 31 year old single woman and a journalist on vacation. John is a 28-year-old male professional poker player. You (Lina) and John just met at a hotel bar in Las Vegas. TERM environment variable not set.
John:
I can run other apps to record audio and transcribe. What could be the reason for this error?
I am not a Linux expert and not experienced at issues around ALSA. I think the installation for Linux needs sudo apt-get install portaudio19-dev
. This "OSError: [Errno -9997] Invalid sample rate" seems to hint to the recording device not being able to record audio in the 16000 Hz framerate whisper needs. Maybe it's needed to record in 44100 Hz (or another sample rate the device supports) and then downsample that before handing over to the RealtimeSTT processing queue.
sudo apt-get install portaudio19-dev already present.
I have already set this 44100 and input device index to 7.
the thing is I have another llm voice agent code that records 16k sample rate fine, (using whisper) but that code doesn't have VAD things. So I am interested to test this repo to get some good realtime effect.
Is ffmpeg version fine ?
The error you're seeing:
File "/home/mypc/miniconda3/envs/VoiceAgent/lib/python3.10/site-packages/pyaudio/init.py", line 441, in init
self._stream = pa.open(**arguments)
OSError: [Errno -9997] Invalid sample rate
indicates that PyAudio is not happy with the sample rate being requested. Could be due to several reasons such as the audio device not supporting the sample rate, or PyAudio having trouble interfacing with the device correctly.
ffmpeg version 4.3 seems bit low, I'm using 6.1. Can't tell if this causes issues, but I don't think it's related to the sample rate problem.
sudo apt-get update
sudo apt-get install ffmpeg
This should install the latest version available in the repositories. Alternatively, you can download and compile the latest version from the FFmpeg website.
Here is a small demo code to test PyAudio with a 16000 Hz mono recording. This will help determine if the issue is with PyAudio or the audio device.
import pyaudio
import wave
# Function to list all available input devices
def list_input_devices():
p = pyaudio.PyAudio()
for i in range(p.get_device_count()):
device_info = p.get_device_info_by_index(i)
if device_info['maxInputChannels'] > 0:
print(f"Device ID {i}: {device_info['name']}")
p.terminate()
# List input devices
list_input_devices()
# Set the input device index (change this based on your device)
input_device_index = int(input("Enter the input device index: "))
# Audio recording parameters
FORMAT = pyaudio.paInt16
CHANNELS = 1 # Mono
RATE = 16000 # 16000 Hz
CHUNK = 1024
RECORD_SECONDS = 5
OUTPUT_FILENAME = "output.wav"
audio = pyaudio.PyAudio()
# Open the stream with the selected input device
stream = audio.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
input_device_index=input_device_index,
frames_per_buffer=CHUNK)
print("Recording...")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("Finished recording.")
# Stop and close the stream
stream.stop_stream()
stream.close()
audio.terminate()
# Save the recorded data as a WAV file
with wave.open(OUTPUT_FILENAME, 'wb') as wf:
wf.setnchannels(CHANNELS)
wf.setsampwidth(audio.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
print(f"Audio saved to {OUTPUT_FILENAME}")
I created the new conda env. with python 3.10.
followed the readme file steps and after pkg installation below error raising.
`Scenario: As Lina you are a 31 year old single woman and a journalist on vacation. John is a 28-year-old male professional poker player. You (Lina) and John just met at a hotel bar in Las Vegas.
John: Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory Traceback (most recent call last): File "/home/smartvessel/Downloads/LocalAIVoiceChat-main/ai_voicetalk_local.py", line 138, in
print(f'{(user_text := recorder.text())}\n<<< {chat_params["char"]}: ', end="", flush=True) File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/site-packages/RealtimeSTT/audio_recorder.py", line 558, in text return self.transcribe() File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/site-packages/RealtimeSTT/audio_recorder.py", line 521, in transcribe status, result = self.parent_transcription_pipe.recv() File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError Unhandled exeption in _recording_worker: Exception in thread Thread-1 (_recording_worker): Traceback (most recent call last): File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/threading.py", line 953, in run self._target(*self._args, *self._kwargs) File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/site-packages/RealtimeSTT/audio_recorder.py", line 678, in _recording_worker data = self.audio_queue.get() File " ", line 2, in get File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/managers.py", line 818, in _callmethod kind, result = conn.recv() File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError Process Process-4: Traceback (most recent call last): File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/process.py", line 108, in run self._target( self._args, **self._kwargs) File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/site-packages/RealtimeSTT/audio_recorder.py", line 443, in _audio_data_worker audio_queue.put(data) File "", line 2, in put File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/managers.py", line 818, in _callmethod kind, result = conn.recv() File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) ConnectionResetError: [Errno 104] Connection reset by peer smartvessel@smartvessel-Z590-AORUS-ELITE-AX:~/Downloads/LocalAIVoiceChat-main$ /home/smartvessel/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
`
Note: I noticed that STT, TSS lib version mentioned in the readme file are older. Can I upgrade to the new version ? any version conflict may occur ?
I think i should have the Cuda 11.8 version to work properly. I am again installing it on another machine which has 11.8.
Note: I noticed that the STT and TSS lib versions mentioned in the readme file are older. Can I upgrade to the new version? any version conflict may occur?
I don't think upgrading the TTS and STT will break functionality. I just fixed the versions to be sure it works.
There is a new issue with RealtimeTTS that may require pip install transformers==4.38.2 though
by the way, LLama cpp is not using cuda,
nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0
What could be the reason for this connection error? any idea
` python ai_voicetalk_local.py try to import llama_cpp_cuda llama_cpp_cuda import failed llama_cpp_lib: return llama_cpp Initializing LLM llama.cpp model ... llama.cpp model initialized Initializing TTS CoquiEngine ...
Using model: xtts Initializing STT AudioToTextRecorder ... ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card' ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card' ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card' ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
Select voice (1-5): 1 This is how voice number 1 sounds like /home/mypc/miniconda3/envs/llmAgent/lib/python3.10/site-packages/TTS/tts/layers/xtts/stream_generator.py:138: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation) warnings.warn( General synthesis error: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:
Error: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:
Exception in thread Thread-4 (synthesize_worker): Traceback (most recent call last): File "/home/mypc/miniconda3/envs/llmAgent/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/home/mypc/miniconda3/envs/llmAgent/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/home/mypc/miniconda3/envs/llmAgent/lib/python3.10/site-packages/RealtimeTTS/text_to_stream.py", line 201, in synthesize_worker self.engine.synthesize(sentence) File "/home/mypc/miniconda3/envs/llmAgent/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 411, in synthesize status, result = self.parent_synthesize_pipe.recv() File "/home/mypc/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/home/mypc/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/home/mypc/miniconda3/envs/llmAgent/lib/python3.10/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError Accept voice (y/n): `
Like I already said, latest RealtimeTTS may require pip install transformers==4.38.2. Coqui TTS has transformers>=4.33.0 in its requirements. But latest transformers isn't compatible with coqui tts anymore.
The problem is that the error occurs between Coqui TTS, torch and transformers library. So either torch or Coqui TTS would need to adjust for the change. Torch for sure will adress this soon, but Coqui TTS isn't maintained anymore.
Since they have transformers>=4.33.0 in their requirements is just installs the latest version. You need to manually downgrade to make it compatible again.
Thanks for this good work.
While running the test script , I am getting above error. Running env. Ubunutu, python 3.10. with latest STT and TTS code.