dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
1.88k stars 416 forks source link

Unable to run Llamaspeak on Jetson Orin AGX 32GB #560

Closed Tosho-01 closed 1 week ago

Tosho-01 commented 1 week ago

Hi, I'm trying to run Llamaspeak following the Instructions on https://www.jetson-ai-lab.com/tutorial_llamaspeak.html

Specs: Jetson AGX Orin (32GB) Developer Kit Jetpack 6.0 [L4T 36.3.0]

The RIVA server is up and running the ASR and TTS examples works just fine.

When i run the following code in /path/to/jetson-containers:

jetson-containers run --env HUGGINGFACE_TOKEN=hf_MYTOKEN \
  $(autotag nano_llm) \
  python3 -m nano_llm.agents.web_chat --api=mlc --verbose\
    --model meta-llama/Meta-Llama-3-8B-Instruct \
    --asr=riva --tts=piper

The Response is:

Namespace(packages=['nano_llm'], prefer=['local', 'registry', 'build'], disable=[''], user='dustynv', output='/tmp/autotag', quiet=False, verbose=False)
-- L4T_VERSION=36.3.0  JETPACK_VERSION=6.0  CUDA_VERSION=12.2
-- Finding compatible container image for ['nano_llm']
dustynv/nano_llm:r36.2.0
[sudo] password for cv: 
localuser:root being added to access control list
+ docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /mnt/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:1 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-3 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-6 --device /dev/i2c-7 --device /dev/i2c-8 --device /dev/i2c-9 -v /run/jtop.sock:/run/jtop.sock --env HUGGINGFACE_TOKEN=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 dustynv/nano_llm:r36.2.0 python3 -m nano_llm.agents.web_chat --api=mlc --verbose --model meta-llama/Meta-Llama-3-8B-Instruct --asr=riva --tts=piper
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
13:31:48 | DEBUG | Namespace(model='meta-llama/Meta-Llama-3-8B-Instruct', quantization=None, api='mlc', vision_api='auto', vision_model=None, vision_scaling=None, prompt=None, save_mermaid=None, chat_template=None, system_prompt=None, wrap_tokens=512, max_context_len=None, max_new_tokens=128, min_new_tokens=-1, do_sample=False, temperature=0.7, top_p=0.95, repetition_penalty=1.0, audio_input_device=None, audio_input_channels=1, audio_output_device=None, audio_output_file=None, audio_output_channels=1, list_audio_devices=False, sample_rate_hz=48000, riva_server='localhost:50051', language_code='en-US', tts='piper', tts_buffering='punctuation', voice=None, voice_speaker=None, voice_rate=1.0, voice_pitch='default', voice_volume='default', asr='riva', asr_confidence=-2.5, asr_silence=-1.0, asr_chunk=1600, boosted_lm_words=None, boosted_lm_score=4.0, profanity_filter=False, inverse_text_normalization=False, automatic_punctuation=True, web_host='0.0.0.0', web_port=8050, ws_port=49000, ssl_key='/etc/ssl/private/localhost.key.pem', ssl_cert='/etc/ssl/private/localhost.cert.pem', upload_dir='/tmp/uploads', web_trace=False, web_title=None, log_level='debug', debug=True)
The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
13:31:48 | DEBUG | Starting new HTTPS connection (1): huggingface.co:443
13:31:48 | DEBUG | https://huggingface.co:443 "GET /api/whoami-v2 HTTP/1.1" 200 727
Token is valid (permission: fineGrained).
Your token has been saved to /data/models/huggingface/token
Login successful
13:31:48 | DEBUG | https://huggingface.co:443 "GET /api/models/meta-llama/Meta-Llama-3-8B-Instruct/revision/main HTTP/1.1" 200 19884
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Fetching 13 files: 100%|██████████████████████████████████████████████| 13/13 [00:00<00:00, 9988.27it/s]
13:31:48 | DEBUG | https://huggingface.co:443 "GET /api/models/meta-llama/Meta-Llama-3-8B-Instruct/revision/main HTTP/1.1" 200 19884
Fetching 17 files: 100%|██████████████████████████████████████████████| 17/17 [00:00<00:00, 4197.51it/s]
13:31:48 | INFO | loading /data/models/huggingface/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/e1945c40cd546c78e41f1151f4db032b271faeaa with MLC
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
13:31:50 | INFO | device=cuda(0), name=Orin, compute=8.7, max_clocks=1300000, multiprocessors=16, max_thread_dims=[1024, 1024, 64], api_version=12020, driver_version=None
13:31:50 | INFO | loading Meta-Llama-3-8B-Instruct from /data/models/mlc/dist/Meta-Llama-3-8B-Instruct-ctx8192/Meta-Llama-3-8B-Instruct-q4f16_ft/Meta-Llama-3-8B-Instruct-q4f16_ft-cuda.so
13:31:50 | WARNING | model library /data/models/mlc/dist/Meta-Llama-3-8B-Instruct-ctx8192/Meta-Llama-3-8B-Instruct-q4f16_ft/Meta-Llama-3-8B-Instruct-q4f16_ft-cuda.so was missing metadata
13:31:50 | DEBUG | using prefill_with_embed() from /data/models/mlc/dist/Meta-Llama-3-8B-Instruct-ctx8192/Meta-Llama-3-8B-Instruct-q4f16_ft/Meta-Llama-3-8B-Instruct-q4f16_ft-cuda.so
13:31:50 | DEBUG | using create_kv_cache() from /data/models/mlc/dist/Meta-Llama-3-8B-Instruct-ctx8192/Meta-Llama-3-8B-Instruct-q4f16_ft/Meta-Llama-3-8B-Instruct-q4f16_ft-cuda.so
┌─────────────────────────┬─────────────────────────────────────────────────────────────────────────────┐
│ architectures           │ ['LlamaForCausalLM']                                                        │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ attention_bias          │ False                                                                       │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ attention_dropout       │ 0.0                                                                         │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ bos_token_id            │ 128000                                                                      │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ eos_token_id            │ 128009                                                                      │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ hidden_act              │ silu                                                                        │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ hidden_size             │ 4096                                                                        │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ initializer_range       │ 0.02                                                                        │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ intermediate_size       │ 14336                                                                       │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ max_position_embeddings │ 8192                                                                        │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ model_type              │ llama                                                                       │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ num_attention_heads     │ 32                                                                          │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ num_hidden_layers       │ 32                                                                          │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ num_key_value_heads     │ 8                                                                           │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ pretraining_tp          │ 1                                                                           │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ rms_norm_eps            │ 1e-05                                                                       │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ rope_scaling            │                                                                             │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ rope_theta              │ 500000.0                                                                    │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ tie_word_embeddings     │ False                                                                       │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ torch_dtype             │ bfloat16                                                                    │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ transformers_version    │ 4.40.0.dev0                                                                 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ use_cache               │ True                                                                        │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ vocab_size              │ 128256                                                                      │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ name                    │ Meta-Llama-3-8B-Instruct                                                    │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ api                     │ mlc                                                                         │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ mm_projector_path       │ /data/models/huggingface/models--meta-llama--Meta-Llama-3-8B-Instruct/snaps │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ quant                   │ q4f16_ft                                                                    │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ type                    │ llama                                                                       │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ max_length              │ 8192                                                                        │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ prefill_chunk_size      │ -1                                                                          │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ load_time               │ 3.98332861200106                                                            │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ params_size             │ 3895.7578125                                                                │
└─────────────────────────┴─────────────────────────────────────────────────────────────────────────────┘

13:31:52 | INFO | using chat template 'llama-3' for model Meta-Llama-3-8B-Instruct
13:31:52 | INFO | model 'Meta-Llama-3-8B-Instruct', chat template 'llama-3' stop tokens:  ['<|end_of_text|>', '<|eot_id|>'] -> [128001, 128009]
13:31:52 | DEBUG | connected ChatQuery to PrintStream on channel=0
13:31:52 | DEBUG | connected RivaASR to PrintStream on channel=0
13:31:52 | DEBUG | connected RivaASR to PrintStream on channel=1
13:31:52 | DEBUG | connected RivaASR to asr_partial on channel=1
13:31:52 | DEBUG | connected RivaASR to asr_final on channel=0
13:31:52 | DEBUG | connected RivaASR to ChatQuery on channel=0
13:31:53 | DEBUG | Loading FFmpeg6
13:31:53 | DEBUG | Failed to load FFmpeg6 extension.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
    ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
  File "/usr/local/lib/python3.10/dist-packages/torio/_extension/utils.py", line 108, in _find_versionsed_ffmpeg_extension
    _load_lib(lib)
  File "/usr/local/lib/python3.10/dist-packages/torio/_extension/utils.py", line 94, in _load_lib
    torch.ops.load_library(path)
  File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 933, in load_library
    ctypes.CDLL(path)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libavutil.so.58: cannot open shared object file: No such file or directory
13:31:53 | DEBUG | Loading FFmpeg5
13:31:53 | DEBUG | Failed to load FFmpeg5 extension.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
    ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
  File "/usr/local/lib/python3.10/dist-packages/torio/_extension/utils.py", line 108, in _find_versionsed_ffmpeg_extension
    _load_lib(lib)
  File "/usr/local/lib/python3.10/dist-packages/torio/_extension/utils.py", line 94, in _load_lib
    torch.ops.load_library(path)
  File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 933, in load_library
    ctypes.CDLL(path)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libavutil.so.57: cannot open shared object file: No such file or directory
13:31:53 | DEBUG | Loading FFmpeg4
13:31:53 | DEBUG | Failed to load FFmpeg4 extension.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
    ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
  File "/usr/local/lib/python3.10/dist-packages/torio/_extension/utils.py", line 109, in _find_versionsed_ffmpeg_extension
    return importlib.import_module(ext)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 674, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1176, in create_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: libavdevice.so.58: cannot open shared object file: No such file or directory
13:31:53 | DEBUG | Loading FFmpeg
13:31:53 | DEBUG | Failed to load FFmpeg extension.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
    ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
  File "/usr/local/lib/python3.10/dist-packages/torio/_extension/utils.py", line 106, in _find_versionsed_ffmpeg_extension
    raise RuntimeError(f"FFmpeg{version} extension is not available.")
RuntimeError: FFmpeg extension is not available.
13:31:53 | DEBUG | Downloading https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/voices.json to /data/models/piper/voices.json
13:31:53 | DEBUG | Loading /data/models/piper/voices.json
13:31:53 | INFO | loading Piper TTS model from /data/models/piper/en_US-libritts-high.onnx
2024-06-19 13:31:56.852316979 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 28 Memcpy nodes are added to the graph torch-jit-export for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-06-19 13:31:56.894268627 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-06-19 13:31:56.894349340 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
13:31:57 | DEBUG | running Piper TTS model warm-up for en_US-libritts-high
13:31:57 | DEBUG | generating Piper TTS with en_US-libritts-high for 'This is a test of the text to speech.'
/opt/NanoLLM/nano_llm/utils/tensor.py:109: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/pytorch/torch/csrc/utils/tensor_numpy.cpp:206.)
  return torch.from_numpy(tensor).to(device=device, dtype=convert_dtype(dtype, to='pt'), **kwargs)
13:31:58 | DEBUG | finished TTS request, streamed 107682 samples at 48.0KHz - 2.24 sec of audio in 1.24 sec (RTFX=1.8030)
13:31:58 | DEBUG | connected PiperTTS to RateLimit on channel=0
13:31:58 | DEBUG | connected ChatQuery to PiperTTS on channel=1
13:31:58 | DEBUG | connected UserPrompt to ChatQuery on channel=0
13:31:58 | DEBUG | connected RivaASR to on_asr_partial on channel=1
13:31:58 | DEBUG | connected ChatQuery to on_llm_reply on channel=0
13:31:58 | DEBUG | connected RateLimit to on_tts_samples on channel=0

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/NanoLLM/nano_llm/agents/web_chat.py", line 310, in <module>
    agent = WebChat(**vars(args))
  File "/opt/NanoLLM/nano_llm/agents/web_chat.py", line 55, in __init__
    self.llm.functions = BotFunctions()
  File "/opt/NanoLLM/nano_llm/plugins/bot_functions/__init__.py", line 86, in __new__
    cls.load(test=test)
  File "/opt/NanoLLM/nano_llm/plugins/bot_functions/__init__.py", line 265, in load
    cls.test()
  File "/opt/NanoLLM/nano_llm/plugins/bot_functions/__init__.py", line 275, in test
    logging.info(f"Bot function descriptions:\n{cls.generate_docs()}")
  File "/opt/NanoLLM/nano_llm/plugins/bot_functions/__init__.py", line 167, in generate_docs
    docs = '\n'.join(['* ' + x.docs for x in cls.functions if x.enabled])
  File "/opt/NanoLLM/nano_llm/plugins/bot_functions/__init__.py", line 167, in <listcomp>
    docs = '\n'.join(['* ' + x.docs for x in cls.functions if x.enabled])
TypeError: can only concatenate str (not "dict") to str

I tried to access the __init__.py file in /docker/overlay2/CONTAINER_ID/diff/opt/NanoLLM/nano_llm/plugins/botfunctions/__init_\.py - but can only find the .pyc file and can't open or edit those.

Kinda stuck at this point.

Any help or recommendations are appreciated.

dusty-nv commented 1 week ago

Hi @Tosho-01 , can you try running dustynv/nano_llm:24.6-r36.2.0 container instead of $(autotag nano_llm) ? I have been doing some restructuring on the repo, sorry about that. That 24.6 release is from ~2 weeks ago which I think should still be fine. I will test llamaspeak again and fix this though for the next release.

Tosho-01 commented 1 week ago

Hi @dusty-nv, thank you for your response. The older version is working for me. Had some troubles in Firefox and switched to Chromium where everything is running smoothly. Thank you for putting out all those jetson-containers :+1: