Open lesreaper opened 1 year ago
@lesreaper unfortunately what I believe has happened is that oobabooga has removed/changed their API. llamaspeak:v2 from the local_llm container. I will look into rolling back the oobabooga commit SHA for the llamaspeak:v1 build.
Everything looks the same on their documentation page for Oobabooga API interactions.
Does it matter I'm not seeing any port bindings on this container: dustynv/text-generation-webui:r35.2.1
Any word yet on getting this update, or how I could fix it?
@lesreaper I am working on this today to build another container for text-generation-webui v1.7, which should still be compatible
OK @lesreaper, sorry for the delay - there we some issues I had to workaround. Can you try running the dustynv/text-generation-webui:1.7-r35.4.1
container instead?
./run.sh --workdir /opt/text-generation-webui dustynv/text-generation-webui:1.7-r35.4.1 \
python3 server.py --listen --verbose --api \
--model-dir=/data/models/text-generation-webui
This version still has the compatible oobabooga API with llamaspeak.
Thank you! I tried building that locally, and it was a disaster trying to get everything in sync on that project on the Orin.
I ran the text-generation-webui
, and it now runs the API endpoint at 5000, at least I think it does. It models, and I can speak on port 7860 to that model, but I can only load a 7b or 13b model. I tried a 30b model and it ran out of memory. That's a separate issue, but at least it's saying it's working now.
The llama-speak does load up!
I'm install unable to get the USB to be recognized however. The USB microphone is a ReSpeaker USB Mic Array, and it's active and works fine in the desktop setup. However, when I select the device in Firefox, and then try to say smoething, nothing gets picked up. I tried again with a Logitech C920 USB web cam, and the microphone wasn't picked up either. Also, I can't type anything into the box. This is verbose
logging from the llama-speak startup:
Namespace(audio_channels=1, audio_chunk=1600, automatic_punctuation=True, boosted_lm_score=4.0, boosted_lm_words=None, debug=False, input_device=None, language_code='en-US', list_devices=False, llm_api_port=5000, llm_server='0.0.0.0', llm_streaming_port=5005, log_level=1, max_new_tokens=256, metadata=None, no_punctuation=False, no_verbatim_transcripts=False, output_device=None, profanity_filter=False, sample_rate_hz=48000, server='localhost:50051', speaker_diarization=False, ssl_cert='/data/cert.pem', ssl_key='/data/key.pem', use_ssl=False, verbatim_transcripts=True, verbose=True, voice='English-US.Female-1', web_port=8050, web_server='0.0.0.0')
...
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
Expression 'alsa_snd_pcm_hw_params_set_rate_near( pcm, hwParams, &setRate, NULL )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 3201
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
-- running ASR service (en-US)
-- running TTS service (en-US, English-US.Female-1)
-- running AudioMixer thread
-- starting webserver @ 0.0.0.0:8050
-- running LLM service (teknium_OpenHermes-2.5-Mistral-7B)
* Serving Flask app 'webserver'
* Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on https://127.0.0.1:8050
* Running on https://10.0.1.183:8050
Riva is running:
nvidia@ubuntu:~/Documents/riva_quickstart_arm64_v2.13.1$ bash riva_start.sh
Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.
Waiting for Riva server to load all models...retrying in 10 seconds
Riva server is ready...
Use this container terminal to run applications:
root@81256061f585:/opt/riva#
And the text-generation-ui is running with a loaded model:
/run.sh --workdir /opt/text-generation-webui dustynv/text-generation-webui:1.7-r35.4.1 \
> python3 server.py --listen --verbose --api \
> --model-dir=/data/models/text-generation-webui
[sudo] password for nvidia:
localuser:root being added to access control list
xauth: file /tmp/.docker.xauth does not exist
+ docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /home/nvidia/Documents/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:1 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --workdir /opt/text-generation-webui dustynv/text-generation-webui:1.7-r35.4.1 python3 server.py --listen --verbose --api --model-dir=/data/models/text-generation-webui
2023-12-05 21:57:50 WARNING:
You are potentially exposing the web UI to the entire internet without any access password.
You can create one with the "--gradio-auth" flag like this:
--gradio-auth username:password
Make sure to replace username:password with your own.
bin /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda114.so
2023-12-05 21:57:55 INFO:Loading settings from settings.json...
Starting API at http://0.0.0.0:5000/api
2023-12-05 21:57:55 INFO:Loading the extension "gallery"...
Starting streaming server at ws://0.0.0.0:5005/api/v1/stream
Running on local URL: http://0.0.0.0:7860
To create a public link, set `share=True` in `launch()`.
2023-12-05 21:58:29 INFO:Loading teknium_OpenHermes-2.5-Mistral-7B...
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:12<00:00, 6.49s/it]
2023-12-05 21:58:52 INFO:Loaded the model in 23.14 seconds.
Is this the problem? Not sure what it means?
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
I have a demo next Thursday at an Ivy and I'd love to showcase what these AGX's can do. Thanks!
Sorry for the delay @lesreaper, and that it can be a bit tricky to get the audio all working with the web setup - are you able to use llamaspeak if you run the browser on a PC client, as opposed to the Jetson?
Normally if USB audio device is directly attached to jetson, I would use --input-device=N
and --output-device=N
(where N is the device index that --list-devices
shows)
Also, if you haven't already, I would test that Riva works with your audio device: https://github.com/dusty-nv/jetson-containers/tree/master/packages/audio/riva-client#list-audio-devices
Sorry for taking so long to get back on this one.
I ran this and here was the response:
./run.sh \
--workdir /opt/riva/python-clients \
$(./autotag riva-client:python) \
python3 scripts/list_audio_devices.py
Output:
0: ReSpeaker 4 Mic Array (UAC1.0): USB Audio (hw:0,0) (inputs=6 outputs=0 sampleRate=16000)
1: NVIDIA Jetson AGX Orin HDA: HDMI 0 (hw:1,3) (inputs=0 outputs=8 sampleRate=44100)
2: NVIDIA Jetson AGX Orin HDA: HDMI 1 (hw:1,7) (inputs=0 outputs=8 sampleRate=44100)
3: NVIDIA Jetson AGX Orin HDA: HDMI 2 (hw:1,8) (inputs=0 outputs=8 sampleRate=44100)
4: NVIDIA Jetson AGX Orin HDA: HDMI 3 (hw:1,9) (inputs=0 outputs=8 sampleRate=44100)
When I run the ASR sample, I get this:
./run.sh --workdir /opt/riva/python-clients $(./autotag riva-client:python) \
> python3 scripts/asr/transcribe_mic.py --input-device=0 --sample-rate-hz=16000
Cannot connect to server request channel
jack server is not running or cannot be started
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
Expression 'parameters->channelCount <= maxChans' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 1514
Expression 'ValidateParameters( inputParameters, hostApi, StreamDirection_In )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2818
Traceback (most recent call last):
File "scripts/asr/transcribe_mic.py", line 75, in <module>
main()
File "scripts/asr/transcribe_mic.py", line 60, in main
with riva.client.audio_io.MicrophoneStream(
File "/usr/local/lib/python3.8/dist-packages/riva/client/audio_io.py", line 24, in __enter__
self._audio_stream = self._audio_interface.open(
File "/usr/lib/python3/dist-packages/pyaudio.py", line 750, in open
stream = Stream(self, *args, **kwargs)
File "/usr/lib/python3/dist-packages/pyaudio.py", line 441, in __init__
self._stream = pa.open(**arguments)
OSError: [Errno -9998] Invalid number of channels
I swap out the ReSpeaker for a simple 2 channel Logitech on channel 24, I get this error:
./run.sh \
--workdir /opt/riva/python-clients $(./autotag riva-client:python) > python3 scripts/asr/transcribe_mic.py --input-device=24 --sample-rate-hz=32000
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "scripts/asr/transcribe_mic.py": permission denied: unknown.
Ran the riva
samples in the python-client
library and had no problem with them.
It won't let me run the --input-device=N
on the llamaspeak
. Tells me it's an unknown flag.
I'm wondering since there are 6 inputs if I need to modify anything in the containers, or set something else up. Any ideas?
It won't let me run the
--input-device=N
on thellamaspeak
. Tells me it's an unknown flag.
Hmm, here it is in the code for chat.py:
Did you try running it like this?
./run.sh --workdir=/opt/llamaspeak \
--env SSL_CERT=/data/cert.pem \
--env SSL_KEY=/data/key.pem \
$(./autotag llamaspeak) \
python3 chat.py --verbose --input-device=0
I'm wondering since there are 6 inputs if I need to modify anything in the containers, or set something else up. Any ideas?
It looks like it's possible that you may need to also set --sample-rate-hz=16000
Hello @dusty-nv, I am trying to run llamaspeak on Jetson. I followed the tutorial to set RIVA up and test transcribe_mic.py with riva docker container running in the background. It works.
./run.sh --workdir /opt/text-generation-webui dustynv/text-generation-webui:1.7-r35.4.1 \
python3 server.py --listen --verbose --api \
--model-dir=/data/models/text-generation-webui
works as well.
I started the docker container via the following command line:
./run.sh \
> -e HUGGINGFACE_TOKEN=hf_xxxxxx \
> -e SSL_KEY=/data/key.pem \
> -e SSL_CERT=/data/cert.pem \
> $(./autotag local_llm) \
> python3 -m local_llm.agents.web_chat \
> --model liuhaotian/llava-v1.5-13b \
> --api=mlc --verbose
the terminal shows the following message:
Namespace(disable=[''], output='/tmp/autotag', packages=['local_llm'], prefer=['local', 'registry', 'build'], quiet=False, **user='dustynv'**, verbose=False)
-- L4T_VERSION=35.4.1 JETPACK_VERSION=5.1.2 CUDA_VERSION=11.8.89
-- Finding compatible container image for ['local_llm']
dustynv/local_llm:r35.3.1
localuser:root being added to access control list
Not sure why the user is 'dustynv'. How can I change this to $USER?
The docker container can be started as shown in the pic `23:11:29 | DEBUG | openai/clip-vit-large-patch14-336 warmup ┌──────────────┬───────────────────────────────────┐ │ name │ openai/clip-vit-large-patch14-336 │ ├──────────────┼───────────────────────────────────┤ │ input_shape │ (336, 336) │ ├──────────────┼───────────────────────────────────┤ │ output_shape │ torch.Size([1, 1024]) │ └──────────────┴───────────────────────────────────┘ 23:11:33 | INFO | loading mm_projector weights from /data/models/huggingface/models--liuhaotian--llava-v1.5-13b/snapshots/d64eb781be6876a5facc160ab1899281f59ef684/mm_projector.bin mm_projector Sequential( (0): Linear(in_features=1024, out_features=5120, bias=True) (1): GELU(approximate='none') (2): Linear(in_features=5120, out_features=5120, bias=True) ) 23:11:33 | INFO | device=cuda(0), name=Orin, compute=8.7, max_clocks=1300000, multiprocessors=16, max_thread_dims=[1024, 1024, 64], api_version=11040, driver_version=None 23:11:33 | INFO | loading llava-v1.5-13b from /data/models/mlc/dist/llava-v1.5-13b-q4f16_ft/llava-v1.5-13b-q4f16_ft-cuda.so ┌─────────────┬────────────────────┐ │ name │ llava-v1.5-13b │ ├─────────────┼────────────────────┤ │ api │ mlc │ ├─────────────┼────────────────────┤ │ quant │ q4f16_ft │ ├─────────────┼────────────────────┤ │ type │ llama │ ├─────────────┼────────────────────┤ │ max_length │ 4096 │ ├─────────────┼────────────────────┤ │ vocab_size │ 32000 │ ├─────────────┼────────────────────┤ │ load_time │ 18.226236065999956 │ ├─────────────┼────────────────────┤ │ params_size │ 6231.634765625 │ └─────────────┴────────────────────┘ 23:11:42 | INFO | using chat template 'llava-v1' for model llava-v1.5-13b 23:11:42 | DEBUG | connected PrintStream to on_eos on channel=0 23:11:42 | DEBUG | connected ChatQuery to PrintStream on channel=0 23:11:42 | DEBUG | connected RivaASR to ChatQuery on channel=0 23:11:42 | DEBUG | connected RivaTTS to RateLimit on channel=0 23:11:42 | DEBUG | connected ChatQuery to RivaTTS on channel=1 23:11:42 | DEBUG | connected UserPrompt to ChatQuery on channel=0 23:11:42 | DEBUG | connected RivaASR to on_asr_partial on channel=1 23:11:42 | DEBUG | connected ChatQuery to on_llm_reply on channel=0 23:11:42 | DEBUG | connected RateLimit to on_tts_samples on channel=0 23:11:42 | DEBUG | webserver root directory: /opt/local_llm/local_llm/web upload directory: /tmp/uploads 23:11:42 | INFO | starting webserver @ https://0.0.0.0:8050 23:11:42 | SUCCESS | WebChat - system ready
I reckon it is due to the ASR transcribe did not work and wonder how I can get it work. :)
On the terminal, it shows
23:12:05 | DEBUG | send_message() - no websocket clients connected, dropping json message
Plus, I watched your video here, which is awesome. I wonder how you got agent option on the top right as well?
Many thanks.
@dusty-nv I wonder if you will have time to have a look at this? Many thanks.
./run.sh \
-e HUGGINGFACE_TOKEN=hf_xxxxxx \ -e SSL_KEY=/data/key.pem \ -e SSL_CERT=/data/cert.pem \ $(./autotag local_llm) \ python3 -m local_llm.agents.web_chat \ --model liuhaotian/llava-v1.5-13b \ --api=mlc --verbose
Hi @cj401, sorry for the delay - the command above will use browser mic, not USB mic. For USB mic connected to Jetson, you would specify --audio-input-device
option.
Not sure why the user is 'dustynv'. How can I change this to $USER?
This is referring to the dockerhub user to look for containers under, not the linux $USER. dustynv is my dockerhub user where all the container images are, so it is correct.
23:12:05 | DEBUG | send_message() - no websocket clients connected, dropping json message
Can you view the browser developer console log by pressing Ctrl+Shift+I
? And see if it is connecting to the server.
Hi @dusty-nv thank you for your reply and I know you must be very busy with various things. with --audio-input-device and --audio-output-device specified, the command line audio transcribe works.
I have not managed to get the multi-modal model work as shown here and this video.
Though I tried to various ways to configure the speaker as shown in the below pic: I wonder what your command line is for getting the above video. Many thanks. I would like to get one work like shown in this video. I noticed that on your interface there is agent option on the top right. :)
log_doc.txt Hi @dusty-nv , I am trying to run llamaspeak on Jetson AGX Orin DevKit 64GB. The web chat page is accessible and also checked on terminal that webchat system is ready. However my voice is not being taken as an input even though the device is detected in the Audio settings window of the chat page. tts is working fine (generated output via text chat box is received on mic). I suppose issue is with asr (tried replacing 'riva' with 'whisper' in the below commands but behavior is the same).
git clone https://github.com/dusty-nv/jetson-containers bash jetson-containers/install.sh jetson-containers run --env HUGGINGFACE_TOKEN=hf_xxxxxx \ $(autotag nano_llm) \ python3 -m nano_llm.agents.web_chat --api=mlc \ --model meta-llama/Meta-Llama-3-8B-Instruct \ --asr=riva --tts=piper
Do i need to start the riva server before running the above commands. Is there any dependency? I have attached the log of the terminal ( USB mic parameters are also logged in it). Please let me know about how to perform asr through my USB mic.
nVidia Jetson Orin AGX, Respeaker 4.0 USB Mic Array LlamaSpeak Tutorial
Installed Riva and Python Client no problem. Tested and works with USB audio.
Left Riva container running in the background. Set up SSH Key. Loaded the model into the text-generation-webui, and it's on
port 7860
.However, when I go to run
llamaspeak
, it dies every time. Command I use is:The output is:
I think it comes down to
llm_api_port=5000
value being wrong, but I have no idea. What am I doing wrong?