Closed rsun-bdti closed 6 months ago
Hi @rsun-bdti, can you try pulling dustynv/nano_llm:24.5-r36.2.0
instead? There were updates for VILA-1.5 support in the 24.5 release of NanoLLM: https://dusty-nv.github.io/NanoLLM/releases.html
Hi Dustin, thanks for your timely response. I pulled docker image 24.5-r36.2.0. However the same error persists. I guess I need to regenerate and/or to re-quantize the TensorRT model. How do I force the script to do that?
You can try deleting the folders from jetson-containers/data/models/clip
and jetson-containers/data/models/mlc/dist/vila1.5*
You can also try running --vision-api=hf
to disable TensorRT with CLIP
Also make sure you are actually running the right container.
Hey Dustin, Thanks for your timely helps! Now the live-streaming demo with VILA1.5-3b is running.
One more question: I saw in your video of the live-streaming demo that you changed the prompt on the fly, while the demo was running. How did you do that? I have not figured out how to switch to other prompts in prompt_history.
Thanks again! That’s some great stuff. -Robby
OK, great that you got it working @rsun-bdti ! If you navigate to the web UI on port 8050 (not the webRTC debug viewer on port 8554), then it should have a drop-down under the video stream that you can either enter prompts into, or select from the pre-populated prompts.
Great! Thanks a lot for your timely helps. I will close this issue.
I can run the NanoVLM live-streaming demo with model VILA-2.7b, but not with VILA1.5-3b. I wonder whether I am missing something, or there is a bug somewhere.
Platform: Jetson AGX Orin 64 GB Environment: L4T_VERSION=36.2.0 JETPACK_VERSION=6.0 CUDA_VERSION=12.2 Code base: https://github.com/dusty-nv/NanoLLM, version 24.4.2 Docker image: dustynv/nano_llm, tag 24.4-r36.2.0
Steps to repeat with VILA-2.7b:
jetson-containers run $(autotag nano_llm)
to launch the docker container.python3 -m nano_llm.agents.video_query --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-context-len 768 --max-new-tokens 32 --video-input /dev/video0 --video-output webrtc://localhost:8554/output
to start the demo.Steps to repeat with VILA1.5-3b:
jetson-containers run $(autotag nano_llm)
to launch the docker container.python3 -m nano_llm.agents.video_query --api=mlc --model Efficient-Large-Model/VILA1.5-3b --max-context-len 256 --max-new-tokens 32 --video-input /dev/video0 --video-output webrtc://localhost:8554/output
to start the demo.The code crashes with the following error message:
17:14:53 | INFO | loading /data/models/huggingface/models--Efficient-Large-Model--VILA1.5-3b/snapshots/699b413ed13620957e955bd7fb938852afa258fc with MLC 17:14:54 | INFO | running MLC quantization: python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA1.5-3b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA1.5-3b-ctx256 Using path "/data/models/mlc/dist/models/VILA1.5-3b" for model "VILA1.5-3b" Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32 Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.10/dist-packages/mlc_llm/build.py", line 47, in
main()
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/build.py", line 43, in main
core.build_model_from_args(parsed_args)
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/core.py", line 859, in build_model_from_args
mod, param_manager, params, model_config = model_generators[args.model_category].get_model(
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/relax_model/llama.py", line 1453, in get_model
raise Exception(
Exception: The model config should contain information about maximum sequence length.
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
Process Process-1:
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/NanoLLM/nano_llm/agents/video_query.py", line 358, in
agent = VideoQuery(vars(args)).run()
File "/opt/NanoLLM/nano_llm/agents/video_query.py", line 44, in init
self.llm = ProcessProxy('ChatQuery', model=model, drop_inputs=True, vision_scaling=vision_scaling, kwargs) #ProcessProxy((lambda kwargs: ChatQuery(model, drop_inputs=True, kwargs)), *kwargs)
File "/opt/NanoLLM/nano_llm/plugins/process_proxy.py", line 38, in init
raise RuntimeError(f"subprocess has an invalid initialization status ({init_msg['status']})")
RuntimeError: subprocess has an invalid initialization status (<class 'subprocess.CalledProcessError'>)
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(self._args, self._kwargs)
File "/opt/NanoLLM/nano_llm/plugins/process_proxy.py", line 132, in run_process
raise error
File "/opt/NanoLLM/nano_llm/plugins/process_proxy.py", line 126, in run_process
self.plugin = ChatQuery(kwargs)
File "/opt/NanoLLM/nano_llm/plugins/chat_query.py", line 70, in init
self.model = NanoLLM.from_pretrained(model, kwargs)
File "/opt/NanoLLM/nano_llm/nano_llm.py", line 73, in from_pretrained
model = MLCModel(model_path, kwargs)
File "/opt/NanoLLM/nano_llm/models/mlc.py", line 59, in init
quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
File "/opt/NanoLLM/nano_llm/models/mlc.py", line 278, in quantize
subprocess.run(cmd, executable='/bin/bash', shell=True, check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA1.5-3b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA1.5-3b-ctx256 ' returned non-zero exit status 1.