dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
1.91k stars 418 forks source link

local_llm Error: Unavailable model requested given these parameters: language_code=en-US; type=online; #366

Closed UserName-wang closed 5 months ago

UserName-wang commented 5 months ago

Hi,

I'm using agx xavier, L4T: 35.4.1, JP:5.1.2, started: riva_quickstart_arm64_v2.12.0. (IP: 192.168.0.40) and I have one orin agx, JP 6. running docker: dustynv/local_llm:r36.2.0 (IP: 192.168.0.46) seems orin can connect to riva server, but the return error messages said: ─────────────┴─────────────────────┘ 03:08:11 | INFO | using chat template 'llama-2' for model Llama-2-13b-chat-hf 03:08:11 | DEBUG | connected PrintStream to on_eos on channel=0 03:08:11 | DEBUG | connected ChatQuery to PrintStream on channel=0 03:08:11 | DEBUG | connected RivaASR to ChatQuery on channel=0 03:08:11 | DEBUG | connected RivaTTS to RateLimit on channel=0 03:08:11 | DEBUG | connected ChatQuery to RivaTTS on channel=1 03:08:11 | DEBUG | connected UserPrompt to ChatQuery on channel=0 03:08:11 | DEBUG | connected RivaASR to on_asr_partial on channel=1 03:08:11 | DEBUG | connected ChatQuery to on_llm_reply on channel=0 03:08:11 | DEBUG | connected RateLimit to on_tts_samples on channel=0 03:08:11 | DEBUG | webserver root directory: /opt/local_llm/local_llm/web upload directory: /tmp/uploads 03:08:11 | INFO | starting webserver @ https://0.0.0.0:8050 03:08:11 | SUCCESS | WebChat - system ready

Seems it's riva sever issue, how to config the riva server? and what's the riva verison can be used for Voice Chat?

UserName-wang commented 5 months ago

I mad some progress by running riva_quickstart_arm64_v2.14.0. the web UI still no response to my voice and text input. but I can input text in terminal. of course I changed the IP address for riva server according to the previous error message in the docker: dustynv/local_llm:r36.2.0.

this is my input to terminal (in the docker: dustynv/local_llm:r36.2.0), but still no voice interactive. what are you doing now? here is the response in same terminal in the docker: dustynv/local_llm:r36.2.0

04:51:08 | DEBUG | processing chat entry 2 role='bot' template=' ${MESSAGE}' open_user_prompt=False cached=false text=' Hello! How can I help you today? Is there a specific question you'd like me to answer?' 04:51:08 | DEBUG | embedding text (1, 24, 5120) float16 -> Hello! How can I help you today? Is there a specific question you'd like me to answer?</s> 04:51:08 | DEBUG | processing chat entry 3 role='user' template='[INST] ${MESSAGE} [/INST]' open_user_prompt=False cached=false text='what are you doing now?' 04:51:08 | DEBUG | embedding text (1, 14, 5120) float16 -> <s>[INST] what are you doing now? [/INST] 04:51:08 | DEBUG | send_message() - no websocket clients connected, dropping json message 04:51:09 | DEBUG | send_message() - no websocket clients connected, dropping json message I04:51:09 | DEBUG | send_message() - no websocket clients connected, dropping json message '04:51:09 | DEBUG | send_message() - no websocket clients connected, dropping json message m04:51:09 | DEBUG | send_message() - no websocket clients connected, dropping json message just04:51:09 | DEBUG | send_message() - no websocket clients connected, dropping json message an04:51:09 | DEBUG | send_message() - no websocket clients connected, dropping json message A04:51:09 | DEBUG | send_message() - no websocket clients connected, dropping json message I04:51:09 | DEBUG | send_message() - no websocket clients connected, dropping json message ,04:51:09 | DEBUG | send_message() - no websocket clients connected, dropping json message so04:51:09 | DEBUG | generating TTS for ' I'm just an AI,' 04:51:09 | DEBUG | send_message() - no websocket clients connected, dropping json message I04:51:09 | DEBUG | send_message() - no websocket clients connected, dropping json message '04:51:09 | DEBUG | send_message() - no websocket clients connected, dropping json message m04:51:09 | DEBUG | send_message() - no websocket clients connected, dropping json message not04:51:09 | DEBUG | send_message() - no websocket clients connected, dropping json message doing04:51:09 | DEBUG | send_message() - no websocket clients connected, dropping json message anything04:51:09 | DEBUG | send_message() - no websocket clients connected, dropping json message in04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping json message the04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping json message physical04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping json message sense04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping json message .04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping json message However04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping json message ,04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping json message I04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping json message '04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping json message m04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping json message here04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping json message to04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping json message help04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping json message answer04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping json message 04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping audio message any04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping json message questions04:51:10 | DEBUG | send_message() - no websocket clients connected, dropping json message you04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping json message may04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping json message have04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping json message to04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping json message the04:51:11 | DEBUG | generating TTS for ' so I'm not doing anything in the physical sense. However,' 04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping json message best04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping json message of04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping json message my04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping json message ability04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping json message .04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping json message What04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping json message would04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping json message you04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping json message like04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping json message to04:51:11 | DEBUG | send_message() - no websocket clients connected, dropping json message talk04:51:12 | DEBUG | send_message() - no websocket clients connected, dropping json message about04:51:12 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:12 | DEBUG | send_message() - no websocket clients connected, dropping json message or04:51:12 | DEBUG | send_message() - no websocket clients connected, dropping json message ask04:51:12 | DEBUG | send_message() - no websocket clients connected, dropping json message ?04:51:12 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:12 | DEBUG | send_message() - no websocket clients connected, dropping json message

04:51:12 | DEBUG | generating TTS for ' I'm here to help answer any questions you may have to the best of my ability. What would you like to talk about or ask?' ┌───────────────┬────────────┐ │ embed_time │ 0.00034221 │ ├───────────────┼────────────┤ │ input_tokens │ 14 │ ├───────────────┼────────────┤ │ output_tokens │ 52 │ ├───────────────┼────────────┤ │ prefill_time │ 0.015067 │ ├───────────────┼────────────┤ │ prefill_rate │ 929.183 │ ├───────────────┼────────────┤ │ decode_time │ 3.39919 │ ├───────────────┼────────────┤ │ decode_rate │ 15.2977 │ └───────────────┴────────────┘ 04:51:12 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:12 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:12 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:12 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:13 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:13 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:13 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:13 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:13 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:13 | DEBUG | send_message() - no websocket clients connected, dropping audio message 04:51:14 | DEBUG | send_message() - no websocket clients connected, dropping audio message

Here is the log information in the docker: riva_quickstart_arm64_v2.14.0. W0120 04:41:02.847985 22 stats_reporter.cc:41] No API key provided. Stats reporting disabled. I0120 04:42:37.066561 210 grpc_riva_asr.cc:1592] ASRService.StreamingRecognize called. I0120 04:42:37.068055 210 grpc_riva_asr.cc:1807] Using model conformer-en-US-asr-streaming from Triton localhost:8001 I0120 04:44:16.068457 210 riva_asr_stream.cc:226] Detected format: encoding = 1 RAW numchannels = 1 samplerate = 48000 bitspersample = 16 I0120 04:44:16.070070 506 grpc_riva_asr.cc:1340] Creating resampler, audio file sample rate=48000 model sample_rate=16000 I0120 04:45:45.635736 210 grpc_riva_asr.cc:1923] ASRService.StreamingRecognize returning OK I0120 04:45:45.637089 210 stats_builder.h:100] {"specversion":"1.0","type":"riva.asr.streamingrecognize.v1","source":"","subject":"","id":"2a2cccee-3e07-4f30-a7e0-380fd99ee2af","datacontenttype":"application/json","time":"2024-01-20T04:42:37.066299438+00:00","data":{"release_version":"2.14.0","customer_uuid":"","ngc_org":"","ngc_team":"","ngc_org_team":"","container_uuid":"","language_code":"en-US","request_count":1,"audio_duration":0.03333333507180214,"speech_duration":0.0,"status":0,"err_msg":""}} I0120 04:46:41.083849 226 grpc_riva_asr.cc:1592] ASRService.StreamingRecognize called. I0120 04:46:41.084828 226 grpc_riva_asr.cc:1807] Using model conformer-en-US-asr-streaming from Triton localhost:8001 I0120 04:48:20.085685 226 riva_asr_stream.cc:226] Detected format: encoding = 1 RAW numchannels = 1 samplerate = 48000 bitspersample = 16 I0120 04:48:20.092751 857 grpc_riva_asr.cc:1340] Creating resampler, audio file sample rate=48000 model sample_rate=16000 I0120 04:49:05.202502 210 grpc_riva_tts.cc:502] TTSService.SynthesizeOnline called. I0120 04:49:05.203187 210 grpc_riva_tts.cc:530] Using model fastpitch_hifigan_ensemble-English-US for inference with speaker_id: 0 I0120 04:49:05.203608 210 grpc_riva_tts.cc:567] Using model fastpitch_hifigan_ensemble-English-US for inference I0120 04:49:07.186895 210 grpc_riva_tts.cc:597] TTSService.SynthesizeOnline returning OK I0120 04:49:07.187906 210 stats_builder.h:164] {"specversion":"1.0","type":"riva.tts.synthesizeonline.v1","source":"","subject":"","id":"0e323315-b9e9-4871-af47-06e809ce6bf1","datacontenttype":"application/json","time":"2024-01-20T04:49:05.202272382+00:00","data":{"release_version":"2.14.0","customer_uuid":"","ngc_org":"","ngc_team":"","ngc_org_team":"","container_uuid":"","language_code":"en-US","request_count":1,"total_characters":7,"audio_duration":0.7314285635948181,"status":0,"err_msg":""}} I0120 04:49:07.193235 210 grpc_riva_tts.cc:502] TTSService.SynthesizeOnline called. I0120 04:49:07.193645 210 grpc_riva_tts.cc:530] Using model fastpitch_hifigan_ensemble-English-US for inference with speaker_id: 0 I0120 04:49:07.193841 210 grpc_riva_tts.cc:567] Using model fastpitch_hifigan_ensemble-English-US for inference I0120 04:49:07.830639 210 grpc_riva_tts.cc:597] TTSService.SynthesizeOnline returning OK I0120 04:49:07.832357 210 stats_builder.h:164] {"specversion":"1.0","type":"riva.tts.synthesizeonline.v1","source":"","subject":"","id":"e485ecdb-e60b-4de2-a499-74e0aadfce8c","datacontenttype":"application/json","time":"2024-01-20T04:49:07.193143107+00:00","data":{"release_version":"2.14.0","customer_uuid":"","ngc_org":"","ngc_team":"","ngc_org_team":"","container_uuid":"","language_code":"en-US","request_count":1,"total_characters":80,"audio_duration":4.005442142486572,"status":0,"err_msg":""}} I0120 04:51:09.620388 922 grpc_riva_tts.cc:502] TTSService.SynthesizeOnline called. I0120 04:51:09.621085 922 grpc_riva_tts.cc:530] Using model fastpitch_hifigan_ensemble-English-US for inference with speaker_id: 0 I0120 04:51:09.621459 922 grpc_riva_tts.cc:567] Using model fastpitch_hifigan_ensemble-English-US for inference I0120 04:51:10.168401 922 grpc_riva_tts.cc:597] TTSService.SynthesizeOnline returning OK I0120 04:51:10.169459 922 stats_builder.h:164] {"specversion":"1.0","type":"riva.tts.synthesizeonline.v1","source":"","subject":"","id":"bdc812a7-d888-44b7-9b0f-39241db5aec9","datacontenttype":"application/json","time":"2024-01-20T04:51:09.62014667+00:00","data":{"release_version":"2.14.0","customer_uuid":"","ngc_org":"","ngc_team":"","ngc_org_team":"","container_uuid":"","language_code":"en-US","request_count":1,"total_characters":16,"audio_duration":1.1319727897644044,"status":0,"err_msg":""}} I0120 04:51:11.309175 922 grpc_riva_tts.cc:502] TTSService.SynthesizeOnline called. I0120 04:51:11.309819 922 grpc_riva_tts.cc:530] Using model fastpitch_hifigan_ensemble-English-US for inference with speaker_id: 0 I0120 04:51:11.310179 922 grpc_riva_tts.cc:567] Using model fastpitch_hifigan_ensemble-English-US for inference I0120 04:51:11.765370 922 grpc_riva_tts.cc:597] TTSService.SynthesizeOnline returning OK I0120 04:51:11.767196 922 stats_builder.h:164] {"specversion":"1.0","type":"riva.tts.synthesizeonline.v1","source":"","subject":"","id":"7eb553c8-ccb7-4c74-a866-9ca0b5cb0c5a","datacontenttype":"application/json","time":"2024-01-20T04:51:11.309008748+00:00","data":{"release_version":"2.14.0","customer_uuid":"","ngc_org":"","ngc_team":"","ngc_org_team":"","container_uuid":"","language_code":"en-US","request_count":1,"total_characters":58,"audio_duration":3.3494784832000734,"status":0,"err_msg":""}} I0120 04:51:12.329936 922 grpc_riva_tts.cc:502] TTSService.SynthesizeOnline called. I0120 04:51:12.330355 922 grpc_riva_tts.cc:530] Using model fastpitch_hifigan_ensemble-English-US for inference with speaker_id: 0 I0120 04:51:12.330626 922 grpc_riva_tts.cc:567] Using model fastpitch_hifigan_ensemble-English-US for inference I0120 04:51:13.025692 922 grpc_riva_tts.cc:597] TTSService.SynthesizeOnline returning OK I0120 04:51:13.027086 922 stats_builder.h:164] {"specversion":"1.0","type":"riva.tts.synthesizeonline.v1","source":"","subject":"","id":"d5e846dd-47a1-45d7-82c4-ee6cc8bdaf05","datacontenttype":"application/json","time":"2024-01-20T04:51:12.32986913+00:00","data":{"release_version":"2.14.0","customer_uuid":"","ngc_org":"","ngc_team":"","ngc_org_team":"","container_uuid":"","language_code":"en-US","request_count":1,"total_characters":120,"audio_duration":5.642448902130127,"status":0,"err_msg":""}}

UserName-wang commented 5 months ago

if I try to run docker: riva_quickstart_arm64_v2.14.0 on orin agx, I got these errors:

E0120 05:46:13.539026 21 model_lifecycle.cc:596] failed to load 'riva-trt-riva-punctuation-en-US-nn-bert-base-uncased' version 1: Not found: unable to load shared library: /lib/aarch64-linux-gnu/libstdc++.so.6: version GLIBCXX_3.4.29' not found (required by /usr/lib/aarch64-linux-gnu/nvidia/libnvdla_compiler.so) E0120 05:46:13.539918 21 model_lifecycle.cc:596] failed to load 'spectrogram_chunker-English-US' version 1: Invalid argument: instance group spectrogram_chunker-English-US_0 of model spectrogram_chunker-English-US has kind KIND_GPU but no GPUs are available I0120 05:46:13.541994 21 pipeline_library.cc:28] TRITONBACKEND_ModelInstanceInitialize: riva-punctuation-en-US_0 (device 0) E0120 05:46:13.542002 21 model_lifecycle.cc:596] failed to load 'tts_postprocessor-English-US' version 1: Not found: unable to load shared library: /lib/aarch64-linux-gnu/libstdc++.so.6: versionGLIBCXX_3.4.29' not found (required by /usr/lib/aarch64-linux-gnu/nvidia/libnvdla_compiler.so) cudaError_t 35 : "CUDA driver version is insufficient for CUDA runtime version" returned from 'cudaHostRegister( pinned_host_punctbuffer.data(), pinned_host_punctbuffer.size() sizeof(float), 0)' in fileriva/nlp/pipeline/punctuator/punctuator.cc line 158' cudaError_t 35 : "CUDA driver version is insufficient for CUDA runtime version" returned from 'cudaHostRegister( pinned_host_capitbuffer.data(), pinned_host_capitbuffer.size() sizeof(float), 0)' in fileriva/nlp/pipeline/punctuator/punctuator.cc line 160' cudaError_t 35 : "CUDA driver version is insufficient for CUDA runtime version" returned from 'cudaSetDevice(device_id)' in file./riva/pipeline/pipeline.h line 52' I0120 05:46:13.569156 21 model_lifecycle.cc:693] successfully loaded 'riva-punctuation-en-US' version 1 E0120 05:46:13.573122 21 model_lifecycle.cc:596] failed to load 'tts_preprocessor-English-US' version 1: Invalid argument: instance group tts_preprocessor-English-US_0 of model tts_preprocessor-English-US has kind KIND_GPU but no GPUs are available E0120 05:46:13.573210 21 model_repository_manager.cc:481] Invalid argument: ensemble 'conformer-en-US-asr-streaming' depends on 'riva-trt-conformer-en-US-asr-streaming-am-streaming' which has no loaded version E0120 05:46:13.573222 21 model_repository_manager.cc:481] Invalid argument: ensemble 'fastpitch_hifigan_ensemble-English-US' depends on 'tts_postprocessor-English-US' which has no loaded version I0120 05:46:13.573301 21 server.cc:563]

UserName-wang commented 5 months ago

here are the input and output devices in the docker riva_quickstart_arm64_v2.14.0 on agx xavier:

root@2c60193c9dd1:/opt/riva# python3 ./examples/transcribe_mic.py --list-devices ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround40 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround41 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround50 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround51 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround71 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib pcm_dmix.c:1089:(snd_pcm_dmix_open) unable to open slave ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround40 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround41 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround50 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround51 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround71 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib pcm_dmix.c:1089:(snd_pcm_dmix_open) unable to open slave Input audio devices: 4: NVIDIA Jetson AGX Xavier APE: - (hw:1,0) 5: NVIDIA Jetson AGX Xavier APE: - (hw:1,1) 6: NVIDIA Jetson AGX Xavier APE: - (hw:1,2) 7: NVIDIA Jetson AGX Xavier APE: - (hw:1,3) 8: NVIDIA Jetson AGX Xavier APE: - (hw:1,4) 9: NVIDIA Jetson AGX Xavier APE: - (hw:1,5) 10: NVIDIA Jetson AGX Xavier APE: - (hw:1,6) 11: NVIDIA Jetson AGX Xavier APE: - (hw:1,7) 12: NVIDIA Jetson AGX Xavier APE: - (hw:1,8) 13: NVIDIA Jetson AGX Xavier APE: - (hw:1,9) 14: NVIDIA Jetson AGX Xavier APE: - (hw:1,10) 15: NVIDIA Jetson AGX Xavier APE: - (hw:1,11) 16: NVIDIA Jetson AGX Xavier APE: - (hw:1,12) 17: NVIDIA Jetson AGX Xavier APE: - (hw:1,13) 18: NVIDIA Jetson AGX Xavier APE: - (hw:1,14) 19: NVIDIA Jetson AGX Xavier APE: - (hw:1,15) 20: NVIDIA Jetson AGX Xavier APE: - (hw:1,16) 21: NVIDIA Jetson AGX Xavier APE: - (hw:1,17) 22: NVIDIA Jetson AGX Xavier APE: - (hw:1,18) 23: NVIDIA Jetson AGX Xavier APE: - (hw:1,19)

UserName-wang commented 5 months ago

this issue was solved by using riva_quickstart_arm64_v2.14.0 on agx xavier, orin agx not support this riva server. Unavailable model requested given these parameters: language_code=en-US; type=online

the docker: dustynv/local_llm:r36.2.0 still running on orin agx.

I used iphone to open the web ui and talk with Llama.

UserName-wang commented 5 months ago

this issue was solved by using riva_quickstart_arm64_v2.14.0 on agx xavier,

dusty-nv commented 5 months ago

Hi @UserName-wang, sorry about that, yes Riva team is still preparing the release of riva_quickstart_arm64 container for JetPack 6, it should be out in a few weeks 👍

raj-khare commented 4 months ago

@dusty-nv why does underlying jetpack version matters, if riva is using docker to run the server?

UserName-wang commented 4 months ago

@dusty-nv why does underlying jetpack version matters, if riva is using docker to run the server?

Maybe it's because of Cuda version, which is related to GPU hardware.

dusty-nv commented 4 months ago

Yes, that is correct - JetPack containers still need rebuilt in between major changes of JetPack version and CUDA version (i.e. from JetPack 5->6 or CUDA 11->12, ect)

Riva team is getting closer to their release of embedded Riva container for JetPack 6, hopefully by the end of the month 🤞

johnnynunez commented 4 months ago

Yes, that is correct - JetPack containers still need rebuilt in between major changes of JetPack version and CUDA version (i.e. from JetPack 5->6 or CUDA 11->12, ect)

Riva team is getting closer to their release of embedded Riva container for JetPack 6, hopefully by the end of the month 🤞

I hope also cuda 12.4 with new drivers for Jetson

rgobbel commented 3 months ago

Has there been any progress on this? I tried to run llamaspeak today on my 64GB AGX Orin dev kit with JP 6, and also ran into the "CUDA driver version is insufficient for CUDA runtime version" lossage. I've updated everything I know how to update, but so for to no avail. This is the first of the example apps I haven't been able to run. Text chat, live-LLava, stable diffusion, etc. all work fine. I had Riva chat working before I upgraded to Jetpack 6, so presumably that's where this incompatibility crept in. I'd be happy to try building the failing component, if you could point me to it.

dusty-nv commented 3 months ago

@rgobbel Riva is not out yet for JP6 unfortunately, hopefully soon. There is whisper, whisperx, containers on here and Jetson AI Lab, but I've not had the time to integrate them into local_llm, but is on the todo list to investigate if/how they support streaming and how fast they are. I looked into whisper.cpp also and that looked to support streaming. By the time I'd get to it, Riva probably be out again, although good to have as backup. Also I added XTTS this week on JP6, and that is in local_llm (aka what runs llamaspeak 2.0).

If you are talking about the original separate llamaspeak 1.0 that used text-generation-webui backend, yea I don't maintain that anymore and IIRC don't build it for JP6, so if you are trying to run the old container on JP6 that may be why you get that error. I had to move to using MLC for inference to get smooth, optimized performance from LLM to avoid hiccups in the chat.

rgobbel commented 3 months ago

@rgobbel Riva is not out yet for JP6 unfortunately, hopefully soon. There is whisper, whisperx, containers on here and Jetson AI Lab, but I've not had the time to integrate them into local_llm, but is on the todo list to investigate if/how they support streaming and how fast they are. I looked into whisper.cpp also and that looked to support streaming. By the time I'd get to it, Riva probably be out again, although good to have as backup. Also I added XTTS this week on JP6, and that is in local_llm (aka what runs llamaspeak 2.0).

If you are talking about the original separate llamaspeak 1.0 that used text-generation-webui backend, yea I don't maintain that anymore and IIRC don't build it for JP6, so if you are trying to run the old container on JP6 that may be why you get that error. I had to move to using MLC for inference to get smooth, optimized performance from LLM to avoid hiccups in the chat.

I was just going through the various LLM containers and trying to reproduce what you showed in the demo videos, especially https://www.youtube.com/watch?v=9ObzbbBTbcc , where you do a voice chat and ask questions about uploaded images.

I haven't tried it recently, but I have run whisper.cpp, and it does a reasonable job and is very fast, but I don't think it was nearly as good as Riva. One thing that amazed me about Riva was it's near perfect automatic punctuation, by far the best I've ever seen, and I've been looking at ASR for a very long time (my wife has been working on ASR and related stuff for 30+ years).

Update: I was able to run the Riva server on a PC with an RTX 3080, and that seems to work just fine with server=:50051.

By the way: thank you for the amazing work you've done. I have the skills to get things like these going on my own, but you've made it so easy, it's wonderful.

dusty-nv commented 3 months ago

Oh cool, yea - you can try setting the --riva-server argument:

https://github.com/dusty-nv/jetson-containers/blob/2fdb601aa6bb284023953d8d3f51f265fbb4dc29/packages/llm/local_llm/utils/args.py#L98

I haven't actually tried changing that, but hope it works for you!


From: Randy Gobbel @.> Sent: Sunday, March 17, 2024 11:28:26 AM To: dusty-nv/jetson-containers @.> Cc: Dustin Franklin @.>; Mention @.> Subject: Re: [dusty-nv/jetson-containers] local_llm Error: Unavailable model requested given these parameters: language_code=en-US; type=online; (Issue #366)

@rgobbelhttps://github.com/rgobbel Riva is not out yet for JP6 unfortunately, hopefully soon. There is whisper, whisperx, containers on here and Jetson AI Lab, but I've not had the time to integrate them into local_llm, but is on the todo list to investigate if/how they support streaming and how fast they are. I looked into whisper.cpp also and that looked to support streaming. By the time I'd get to it, Riva probably be out again, although good to have as backup. Also I added XTTS this week on JP6, and that is in local_llm (aka what runs llamaspeak 2.0).

If you are talking about the original separate llamaspeak 1.0 that used text-generation-webui backend, yea I don't maintain that anymore and IIRC don't build it for JP6, so if you are trying to run the old container on JP6 that may be why you get that error. I had to move to using MLC for inference to get smooth, optimized performance from LLM to avoid hiccups in the chat.

I was just going through the various LLM containers and trying to reproduce what you showed in the demo videos, especially https://www.youtube.com/watch?v=9ObzbbBTbcc , where you do a voice chat and ask questions about uploaded images.

I haven't tried it recently, but I have run whisper.cpp, and it does a reasonable job and is very fast, but I don't think it was nearly as good as Riva. One thing that amazed me about Riva was it's near perfect automatic punctuation, by far the best I've ever seen, and I've been looking at ASR for a very long time (my wife has been working on ASR and related stuff for 30+ years).

I have a beefy PC with an RTX 3080, and I have been able to get the Riva server up and running on that. Is there a simple way to get llamaspeak to use that for the Riva server, and have the rest running locally on the Orin? It looks like it should be pretty easy.

By the way: thank you for the amazing work you've done. I have the skills to get things like these going on my own, but you've made it so easy, it's wonderful.

— Reply to this email directly, view it on GitHubhttps://github.com/dusty-nv/jetson-containers/issues/366#issuecomment-2002566575, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADVEGK47Z6DB275P2534FULYYXOEVAVCNFSM6AAAAABCC4B6W2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBSGU3DMNJXGU. You are receiving this because you were mentioned.Message ID: @.***>

rgobbel commented 3 months ago

I got it to work, after some minor bugfixes. Riva TTS was failing because the voice rate was in the wrong format. It needs to be a percentage, not a float. So in local_llm_plugins_audio/riva_tts.py (line 42):

-        self.rate = voice_rate
+        self.rate = f'{voice_rate:.0%}'

... and in agents/web_chat.py (line 57):

-                self.tts.rate =float(msg['tts_rate']
+                self.tts.rate = f"{float(msg['tts_rate']):.0%}"

Now it's running pretty well. It ran with no code changes using XTTS for TTS, but those voices are terrible compared to Riva. Now that I'm feeding it good input, running Riva on another machine works very well, with no noticeable lag due to network traffic. I now have it running with meta-llama/Llama-2-70b-chat-hf, and the latency is noticeable compared to a smaller model, but it works!

dusty-nv commented 3 months ago

OK gotcha! Sorry for the hiccups and glad you got it working after all. I don't mind the XTTS voices much but tend to agree (XTTS does support online voice cloning though which is a cool feature), although XTTS is much slower, and I will likely change back to onboard Riva TTS the first chance that I get 👍