Open lukecdash opened 4 months ago
Yes, sorry about that - it is fixed in main branch, but please see https://github.com/dusty-nv/NanoLLM/issues/20#issuecomment-2181285298 for workaround to use 24.6 release of nano_llm container for now 👍
From: lukecdash @.> Sent: Friday, June 21, 2024 2:38:49 PM To: dusty-nv/NanoLLM @.> Cc: Subscribed @.***> Subject: [dusty-nv/NanoLLM] Import typo in auto_asr.py? (Issue #22)
Hi, i'm sorry, this library looks amazing, I am just trying to run this example from the documentation. The traceback indicates to me it is looking in plugins/audio for a module riva_asr.py which is in plugins/speech. Could this just be as simple as a typo? in the code?
jetson-containers run --env HUGGINGFACE_TOKEN=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 $(autotag nano_llm) python3 -m nano_llm.agents.web_chat --api=mlc --model meta-llama/Meta-Llama-3-8B-Instruct --asr=riva --tts=piper Namespace(packages=['nano_llm'], prefer=['local', 'registry', 'build'], disable=[''], user='dustynv', output='/tmp/autotag', quiet=False, verbose=False) -- L4T_VERSION=36.3.0 JETPACK_VERSION=6.0 CUDA_VERSION=12.2 -- Finding compatible container image for ['nano_llm'] dustynv/nano_llm:r36.2.0 [sudo] password for nvidia16: Sorry, try again. [sudo] password for nvidia16: localuser:root being added to access control list
18:20:00 | INFO | using chat template 'llama-3' for model Meta-Llama-3-8B-Instruct 18:20:00 | INFO | model 'Meta-Llama-3-8B-Instruct', chat template 'llama-3' stop tokens: ['<|end_of_text|>', '<|eot_id|>'] -> [128001, 128009] 18:20:00 | INFO | Warming up LLM with query 'What is 2+2?' 18:20:02 | INFO | Warmup response: 'Easy peasy!\n\nThe answer to 2+2 is... 4!<|eot_id|>' Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/NanoLLM/nano_llm/agents/web_chat.py", line 327, in agent = WebChat(vars(args)) File "/opt/NanoLLM/nano_llm/agents/web_chat.py", line 32, in init super().init(kwargs) File "/opt/NanoLLM/nano_llm/agents/voice_chat.py", line 38, in init self.asr = AutoASR.from_pretrained(asr=asr, **kwargs) File "/opt/NanoLLM/nano_llm/plugins/speech/auto_asr.py", line 34, in from_pretrained from nano_llm.plugins.audio.riva_asr import RivaASR ModuleNotFoundError: No module named 'nano_llm.plugins.audio.riva_asr'
— Reply to this email directly, view it on GitHubhttps://github.com/dusty-nv/NanoLLM/issues/22, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADVEGK77NVHGX43ITMIEQG3ZIRXLTAVCNFSM6AAAAABJWSIFXWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3DOMBWHAYTGMA. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Oh sorry I missed that thank you for pointing me to that!
Hi, i'm sorry, this library looks amazing, I am just trying to run this example from the documentation. The traceback indicates to me it is looking in plugins/audio for a module riva_asr.py which is in plugins/speech. Could this just be as simple as a typo? in the code?
jetson-containers run --env HUGGINGFACE_TOKEN=---------------------------------------- $(autotag nano_llm) python3 -m nano_llm.agents.web_chat --api=mlc --model meta-llama/Meta-Llama-3-8B-Instruct --asr=riva --tts=piper Namespace(packages=['nano_llm'], prefer=['local', 'registry', 'build'], disable=[''], user='dustynv', output='/tmp/autotag', quiet=False, verbose=False) -- L4T_VERSION=36.3.0 JETPACK_VERSION=6.0 CUDA_VERSION=12.2 -- Finding compatible container image for ['nano_llm'] dustynv/nano_llm:r36.2.0 [sudo] password for nvidia16: Sorry, try again. [sudo] password for nvidia16: localuser:root being added to access control list
TRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. UseHF_HOME
instead. warnings.warn( The token has not been saved to the git credentials helper. Passadd_to_git_credential=True
in this function directly or--add-to-git-credential
if using viahuggingface-cli
if you want to set the git credential as well. Token is valid (permission: read). Your token has been saved to /data/models/huggingface/token Login successful /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning:resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, useforce_download=True
. warnings.warn( Fetching 13 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 9119.58it/s] Fetching 17 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 5235.18it/s] 18:19:49 | INFO | loading /data/models/huggingface/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/e1945c40cd546c78e41f1151f4db032b271faeaa with MLC Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 18:19:52 | INFO | device=cuda(0), name=Orin, compute=8.7, max_clocks=918000, multiprocessors=8, max_thread_dims=[1024, 1024, 64], api_version=12020, driver_version=None 18:19:52 | INFO | loading Meta-Llama-3-8B-Instruct from /data/models/mlc/dist/Meta-Llama-3-8B-Instruct-ctx8192/Meta-Llama-3-8B-Instruct-q4f16_ft/Meta-Llama-3-8B-Instruct-q4f16_ft-cuda.so 18:19:52 | WARNING | model library /data/models/mlc/dist/Meta-Llama-3-8B-Instruct-ctx8192/Meta-Llama-3-8B-Instruct-q4f16_ft/Meta-Llama-3-8B-Instruct-q4f16_ft-cuda.so was missing metadata ┌─────────────────────────┬─────────────────────────────────────────────────────────────────────────────┐ │ architectures │ ['LlamaForCausalLM'] │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ attention_bias │ False │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ attention_dropout │ 0.0 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ bos_token_id │ 128000 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ eos_token_id │ 128009 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ hidden_act │ silu │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ hidden_size │ 4096 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ initializer_range │ 0.02 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ intermediate_size │ 14336 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ max_position_embeddings │ 8192 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ model_type │ llama │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ num_attention_heads │ 32 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ num_hidden_layers │ 32 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ num_key_value_heads │ 8 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ pretraining_tp │ 1 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ rms_norm_eps │ 1e-05 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ rope_scaling │ │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ rope_theta │ 500000.0 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ tie_word_embeddings │ False │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ torch_dtype │ bfloat16 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ transformers_version │ 4.40.0.dev0 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ use_cache │ True │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ vocab_size │ 128256 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ name │ Meta-Llama-3-8B-Instruct │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ api │ mlc │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ mm_projector_path │ /data/models/huggingface/models--meta-llama--Meta-Llama-3-8B-Instruct/snaps │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ quant │ q4f16_ft │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ type │ llama │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ max_length │ 8192 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ prefill_chunk_size │ -1 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ load_time │ 10.725436296997941 │ ├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤ │ params_size │ 3895.7578125 │ └─────────────────────────┴─────────────────────────────────────────────────────────────────────────────┘18:20:00 | INFO | using chat template 'llama-3' for model Meta-Llama-3-8B-Instruct 18:20:00 | INFO | model 'Meta-Llama-3-8B-Instruct', chat template 'llama-3' stop tokens: ['<|end_of_text|>', '<|eot_id|>'] -> [128001, 128009] 18:20:00 | INFO | Warming up LLM with query 'What is 2+2?' 18:20:02 | INFO | Warmup response: 'Easy peasy!\n\nThe answer to 2+2 is... 4!<|eot_id|>' Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/NanoLLM/nano_llm/agents/web_chat.py", line 327, in
agent = WebChat(vars(args))
File "/opt/NanoLLM/nano_llm/agents/web_chat.py", line 32, in init
super().init(kwargs)
File "/opt/NanoLLM/nano_llm/agents/voice_chat.py", line 38, in init
self.asr = AutoASR.from_pretrained(asr=asr, **kwargs)
File "/opt/NanoLLM/nano_llm/plugins/speech/auto_asr.py", line 34, in from_pretrained
from nano_llm.plugins.audio.riva_asr import RivaASR
ModuleNotFoundError: No module named 'nano_llm.plugins.audio.riva_asr'