Can't get LLama Speak to run with USB Audio #339

lesreaper opened 9 months ago

lesreaper commented 9 months ago

nVidia Jetson Orin AGX, Respeaker 4.0 USB Mic Array LlamaSpeak Tutorial

Installed Riva and Python Client no problem. Tested and works with USB audio.

Left Riva container running in the background. Set up SSH Key. Loaded the model into the text-generation-webui, and it's on port 7860.

However, when I go to run llamaspeak, it dies every time. Command I use is:

./run.sh --env SSL_CERT=/data/cert.pem --env SSL_KEY=/data/key.pem $(./autotag llamaspeak)

The output is:

./run.sh --workdir=/opt/llamaspeak \
>   --env SSL_CERT=/data/cert.pem \
>   --env SSL_KEY=/data/key.pem \
>   $(./autotag llamaspeak) \
>   python3 chat.py --verbose --debug
Namespace(disable=[''], output='/tmp/autotag', packages=['llamaspeak'], prefer=['local', 'registry', 'build'], quiet=False, user='dustynv', verbose=False)
-- Finding compatible container image for ['llamaspeak']
localuser:root being added to access control list
+ docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /home/nvidia/Documents/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:1 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --workdir=/opt/llamaspeak --env SSL_CERT=/data/cert.pem --env SSL_KEY=/data/key.pem dustynv/llamaspeak:r35.4.1 python3 chat.py --verbose --debug
Namespace(audio_channels=1, audio_chunk=1600, automatic_punctuation=True, boosted_lm_score=4.0, boosted_lm_words=None, debug=True, input_device=None, language_code='en-US', list_devices=False, llm_api_port=5000, llm_server='', llm_streaming_port=5005, log_level=2, max_new_tokens=256, metadata=None, no_punctuation=False, no_verbatim_transcripts=False, output_device=None, profanity_filter=False, sample_rate_hz=48000, server='localhost:50051', speaker_diarization=False, ssl_cert='/data/cert.pem', ssl_key='/data/key.pem', use_ssl=False, verbatim_transcripts=True, verbose=True, voice='English-US.Female-1', web_port=8050, web_server='')
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 159, in _new_conn
    conn = connection.create_connection(
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 74, in create_connection
ConnectionRefusedError: [Errno 111] Connection refused

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 719, in urlopen
    retries = retries.increment(
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 436, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='', port=5000): Max retries exceeded with url: /api/v1/model (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffff87d80bb0>: Failed to establish a new connection: [Errno 111] Connection refused'))

I think it comes down to llm_api_port=5000 value being wrong, but I have no idea. What am I doing wrong?

dusty-nv commented 9 months ago

@lesreaper unfortunately what I believe has happened is that oobabooga has removed/changed their API. llamaspeak:v2 from the local_llm container. I will look into rolling back the oobabooga commit SHA for the llamaspeak:v1 build.

lesreaper commented 9 months ago

Everything looks the same on their documentation page for Oobabooga API interactions.

Does it matter I'm not seeing any port bindings on this container: dustynv/text-generation-webui:r35.2.1

lesreaper commented 9 months ago

Any word yet on getting this update, or how I could fix it?

dusty-nv commented 9 months ago

@lesreaper I am working on this today to build another container for text-generation-webui v1.7, which should still be compatible

dusty-nv commented 9 months ago

OK @lesreaper, sorry for the delay - there we some issues I had to workaround. Can you try running the dustynv/text-generation-webui:1.7-r35.4.1 container instead?

./run.sh --workdir /opt/text-generation-webui dustynv/text-generation-webui:1.7-r35.4.1 \
   python3 server.py --listen --verbose --api \

This version still has the compatible oobabooga API with llamaspeak.

lesreaper commented 9 months ago

Thank you! I tried building that locally, and it was a disaster trying to get everything in sync on that project on the Orin.

I ran the text-generation-webui, and it now runs the API endpoint at 5000, at least I think it does. It models, and I can speak on port 7860 to that model, but I can only load a 7b or 13b model. I tried a 30b model and it ran out of memory. That's a separate issue, but at least it's saying it's working now.

The llama-speak does load up!

I'm install unable to get the USB to be recognized however. The USB microphone is a ReSpeaker USB Mic Array, and it's active and works fine in the desktop setup. However, when I select the device in Firefox, and then try to say smoething, nothing gets picked up. I tried again with a Logitech C920 USB web cam, and the microphone wasn't picked up either. Also, I can't type anything into the box. This is verbose logging from the llama-speak startup:

Namespace(audio_channels=1, audio_chunk=1600, automatic_punctuation=True, boosted_lm_score=4.0, boosted_lm_words=None, debug=False, input_device=None, language_code='en-US', list_devices=False, llm_api_port=5000, llm_server='', llm_streaming_port=5005, log_level=1, max_new_tokens=256, metadata=None, no_punctuation=False, no_verbatim_transcripts=False, output_device=None, profanity_filter=False, sample_rate_hz=48000, server='localhost:50051', speaker_diarization=False, ssl_cert='/data/cert.pem', ssl_key='/data/key.pem', use_ssl=False, verbatim_transcripts=True, verbose=True, voice='English-US.Female-1', web_port=8050, web_server='')
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
Expression 'alsa_snd_pcm_hw_params_set_rate_near( pcm, hwParams, &setRate, NULL )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 3201
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
-- running ASR service (en-US)
-- running TTS service (en-US, English-US.Female-1)
-- running AudioMixer thread
-- starting webserver @
-- running LLM service (teknium_OpenHermes-2.5-Mistral-7B)
 * Serving Flask app 'webserver'
 * Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (
 * Running on
 * Running on

Riva is running:

nvidia@ubuntu:~/Documents/riva_quickstart_arm64_v2.13.1$ bash riva_start.sh 
Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.
Waiting for Riva server to load all models...retrying in 10 seconds
Riva server is ready...
Use this container terminal to run applications:

And the text-generation-ui is running with a loaded model:

/run.sh --workdir /opt/text-generation-webui dustynv/text-generation-webui:1.7-r35.4.1 \
>    python3 server.py --listen --verbose --api \
> --model-dir=/data/models/text-generation-webui
[sudo] password for nvidia: 
localuser:root being added to access control list
xauth:  file /tmp/.docker.xauth does not exist
+ docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /home/nvidia/Documents/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:1 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --workdir /opt/text-generation-webui dustynv/text-generation-webui:1.7-r35.4.1 python3 server.py --listen --verbose --api --model-dir=/data/models/text-generation-webui
2023-12-05 21:57:50 WARNING:
You are potentially exposing the web UI to the entire internet without any access password.
You can create one with the "--gradio-auth" flag like this:

--gradio-auth username:password

Make sure to replace username:password with your own.
bin /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda114.so
2023-12-05 21:57:55 INFO:Loading settings from settings.json...
Starting API at
2023-12-05 21:57:55 INFO:Loading the extension "gallery"...
Starting streaming server at ws://
Running on local URL:

To create a public link, set `share=True` in `launch()`.
2023-12-05 21:58:29 INFO:Loading teknium_OpenHermes-2.5-Mistral-7B...
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:12<00:00,  6.49s/it]
2023-12-05 21:58:52 INFO:Loaded the model in 23.14 seconds.
lesreaper commented 8 months ago

Is this the problem? Not sure what it means?

Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started

I have a demo next Thursday at an Ivy and I'd love to showcase what these AGX's can do. Thanks!

dusty-nv commented 8 months ago

Sorry for the delay @lesreaper, and that it can be a bit tricky to get the audio all working with the web setup - are you able to use llamaspeak if you run the browser on a PC client, as opposed to the Jetson?

Normally if USB audio device is directly attached to jetson, I would use --input-device=N and --output-device=N (where N is the device index that --list-devices shows)

Also, if you haven't already, I would test that Riva works with your audio device: https://github.com/dusty-nv/jetson-containers/tree/master/packages/audio/riva-client#list-audio-devices

lesreaper commented 8 months ago

Sorry for taking so long to get back on this one.

I ran this and here was the response:

./run.sh \
--workdir /opt/riva/python-clients \
$(./autotag riva-client:python) \
   python3 scripts/list_audio_devices.py

 0: ReSpeaker 4 Mic Array (UAC1.0): USB Audio (hw:0,0) (inputs=6   outputs=0   sampleRate=16000)
 1: NVIDIA Jetson AGX Orin HDA: HDMI 0 (hw:1,3)        (inputs=0   outputs=8   sampleRate=44100)
 2: NVIDIA Jetson AGX Orin HDA: HDMI 1 (hw:1,7)        (inputs=0   outputs=8   sampleRate=44100)
 3: NVIDIA Jetson AGX Orin HDA: HDMI 2 (hw:1,8)        (inputs=0   outputs=8   sampleRate=44100)
 4: NVIDIA Jetson AGX Orin HDA: HDMI 3 (hw:1,9)        (inputs=0   outputs=8   sampleRate=44100)

When I run the ASR sample, I get this:

./run.sh --workdir /opt/riva/python-clients $(./autotag riva-client:python) \
>    python3 scripts/asr/transcribe_mic.py --input-device=0 --sample-rate-hz=16000

Cannot connect to server request channel
jack server is not running or cannot be started
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
Expression 'parameters->channelCount <= maxChans' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 1514
Expression 'ValidateParameters( inputParameters, hostApi, StreamDirection_In )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2818
Traceback (most recent call last):
  File "scripts/asr/transcribe_mic.py", line 75, in <module>
  File "scripts/asr/transcribe_mic.py", line 60, in main
    with riva.client.audio_io.MicrophoneStream(
  File "/usr/local/lib/python3.8/dist-packages/riva/client/audio_io.py", line 24, in __enter__
    self._audio_stream = self._audio_interface.open(
  File "/usr/lib/python3/dist-packages/pyaudio.py", line 750, in open
    stream = Stream(self, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/pyaudio.py", line 441, in __init__
    self._stream = pa.open(**arguments)
OSError: [Errno -9998] Invalid number of channels

I swap out the ReSpeaker for a simple 2 channel Logitech on channel 24, I get this error:

./run.sh \
--workdir /opt/riva/python-clients $(./autotag riva-client:python) > python3 scripts/asr/transcribe_mic.py --input-device=24 --sample-rate-hz=32000

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "scripts/asr/transcribe_mic.py": permission denied: unknown.

Ran the riva samples in the python-client library and had no problem with them.

It won't let me run the --input-device=N on the llamaspeak. Tells me it's an unknown flag.

I'm wondering since there are 6 inputs if I need to modify anything in the containers, or set something else up. Any ideas?

dusty-nv commented 8 months ago

It won't let me run the --input-device=N on the llamaspeak. Tells me it's an unknown flag.

Hmm, here it is in the code for chat.py:


Did you try running it like this?

./run.sh --workdir=/opt/llamaspeak \
  --env SSL_CERT=/data/cert.pem \
  --env SSL_KEY=/data/key.pem \
  $(./autotag llamaspeak) \
  python3 chat.py --verbose --input-device=0

I'm wondering since there are 6 inputs if I need to modify anything in the containers, or set something else up. Any ideas?

It looks like it's possible that you may need to also set --sample-rate-hz=16000

cj401 commented 8 months ago

Hello @dusty-nv, I am trying to run llamaspeak on Jetson. I followed the tutorial to set RIVA up and test transcribe_mic.py with riva docker container running in the background. It works.

./run.sh --workdir /opt/text-generation-webui dustynv/text-generation-webui:1.7-r35.4.1 \
   python3 server.py --listen --verbose --api \

works as well.

I started the docker container via the following command line:

./run.sh \
>     -e HUGGINGFACE_TOKEN=hf_xxxxxx \
>     -e SSL_KEY=/data/key.pem \
>     -e SSL_CERT=/data/cert.pem \
>     $(./autotag local_llm) \
>     python3 -m local_llm.agents.web_chat \
>     --model liuhaotian/llava-v1.5-13b \
>     --api=mlc --verbose

the terminal shows the following message:

Namespace(disable=[''], output='/tmp/autotag', packages=['local_llm'], prefer=['local', 'registry', 'build'], quiet=False, **user='dustynv'**, verbose=False)
-- Finding compatible container image for ['local_llm']
localuser:root being added to access control list

Not sure why the user is 'dustynv'. How can I change this to $USER?

The docker container can be started as shown in the pic Screenshot from 2023-12-28 23-16-54 `23:11:29 | DEBUG | openai/clip-vit-large-patch14-336 warmup ┌──────────────┬───────────────────────────────────┐ │ name │ openai/clip-vit-large-patch14-336 │ ├──────────────┼───────────────────────────────────┤ │ input_shape │ (336, 336) │ ├──────────────┼───────────────────────────────────┤ │ output_shape │ torch.Size([1, 1024]) │ └──────────────┴───────────────────────────────────┘ 23:11:33 | INFO | loading mm_projector weights from /data/models/huggingface/models--liuhaotian--llava-v1.5-13b/snapshots/d64eb781be6876a5facc160ab1899281f59ef684/mm_projector.bin mm_projector Sequential( (0): Linear(in_features=1024, out_features=5120, bias=True) (1): GELU(approximate='none') (2): Linear(in_features=5120, out_features=5120, bias=True) ) 23:11:33 | INFO | device=cuda(0), name=Orin, compute=8.7, max_clocks=1300000, multiprocessors=16, max_thread_dims=[1024, 1024, 64], api_version=11040, driver_version=None 23:11:33 | INFO | loading llava-v1.5-13b from /data/models/mlc/dist/llava-v1.5-13b-q4f16_ft/llava-v1.5-13b-q4f16_ft-cuda.so ┌─────────────┬────────────────────┐ │ name │ llava-v1.5-13b │ ├─────────────┼────────────────────┤ │ api │ mlc │ ├─────────────┼────────────────────┤ │ quant │ q4f16_ft │ ├─────────────┼────────────────────┤ │ type │ llama │ ├─────────────┼────────────────────┤ │ max_length │ 4096 │ ├─────────────┼────────────────────┤ │ vocab_size │ 32000 │ ├─────────────┼────────────────────┤ │ load_time │ 18.226236065999956 │ ├─────────────┼────────────────────┤ │ params_size │ 6231.634765625 │ └─────────────┴────────────────────┘ 23:11:42 | INFO | using chat template 'llava-v1' for model llava-v1.5-13b 23:11:42 | DEBUG | connected PrintStream to on_eos on channel=0 23:11:42 | DEBUG | connected ChatQuery to PrintStream on channel=0 23:11:42 | DEBUG | connected RivaASR to ChatQuery on channel=0 23:11:42 | DEBUG | connected RivaTTS to RateLimit on channel=0 23:11:42 | DEBUG | connected ChatQuery to RivaTTS on channel=1 23:11:42 | DEBUG | connected UserPrompt to ChatQuery on channel=0 23:11:42 | DEBUG | connected RivaASR to on_asr_partial on channel=1 23:11:42 | DEBUG | connected ChatQuery to on_llm_reply on channel=0 23:11:42 | DEBUG | connected RateLimit to on_tts_samples on channel=0 23:11:42 | DEBUG | webserver root directory: /opt/local_llm/local_llm/web upload directory: /tmp/uploads 23:11:42 | INFO | starting webserver @ 23:11:42 | SUCCESS | WebChat - system ready

Screenshot from 2023-12-28 23-18-24

I reckon it is due to the ASR transcribe did not work and wonder how I can get it work. :)

On the terminal, it shows

23:12:05 | DEBUG | send_message() - no websocket clients connected, dropping json message

Plus, I watched your video here, which is awesome. I wonder how you got agent option on the top right as well?

Many thanks.

cj401 commented 8 months ago

@dusty-nv I wonder if you will have time to have a look at this? Many thanks.

dusty-nv commented 8 months ago

./run.sh \

-e HUGGINGFACE_TOKEN=hf_xxxxxx \
-e SSL_KEY=/data/key.pem \
-e SSL_CERT=/data/cert.pem \
$(./autotag local_llm) \
python3 -m local_llm.agents.web_chat \
--model liuhaotian/llava-v1.5-13b \
--api=mlc --verbose

Hi @cj401, sorry for the delay - the command above will use browser mic, not USB mic. For USB mic connected to Jetson, you would specify --audio-input-device option.

Not sure why the user is 'dustynv'. How can I change this to $USER?

This is referring to the dockerhub user to look for containers under, not the linux $USER. dustynv is my dockerhub user where all the container images are, so it is correct.

23:12:05 | DEBUG | send_message() - no websocket clients connected, dropping json message

Can you view the browser developer console log by pressing Ctrl+Shift+I? And see if it is connecting to the server.

cj401 commented 7 months ago

Hi @dusty-nv thank you for your reply and I know you must be very busy with various things. with --audio-input-device and --audio-output-device specified, the command line audio transcribe works.

I have not managed to get the multi-modal model work as shown here and this video.

Though I tried to various ways to configure the speaker as shown in the below pic: Screenshot from 2024-01-08 20-54-08 I wonder what your command line is for getting the above video. Many thanks. I would like to get one work like shown in this video. I noticed that on your interface there is agent option on the top right. :)