In Live LLaVA, NanoVLM and Nanodb, we use video files as video-input, since we don't have V4L2 USB webcam at the moment. Based on the demo descriptions, seems it supports network stream (like RTSP). We have IPCAMs can stream live rtsp video stream. It's H264, mainstream. VLC can view the stream correctly. We try to use it in the demo, but got error messages. Guess the command line format is wrong?
Thanks!
sudo docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/orinnx/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb dustynv/nano_llm:r36.2.0 python3 -m nano_llm.agents.video_query --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-context-len 256 --max-new-tokens 32 --video-input rtsp://admin:LabSys101@10.51.170.55/stream1 --video-output webrtc://@:8554/output --nanodb /data/nanodb/coco/2017
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
Fetching 10 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 50051.36it/s]
Fetching 12 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 60133.39it/s]
15:28:21 | INFO | loading /data/models/huggingface/models--Efficient-Large-Model--VILA-2.7b/snapshots/2ed82105eefd5926cccb46af9e71b0ca77f12704 with MLC
You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
15:28:23 | INFO | device=cuda(0), name=Orin, compute=8.7, max_clocks=918000, multiprocessors=8, max_thread_dims=[1024, 1024, 64], api_version=12020, driver_version=None
15:28:23 | INFO | loading VILA-2.7b from /data/models/mlc/dist/VILA-2.7b-ctx256/VILA-2.7b-q4f16_ft/VILA-2.7b-q4f16_ft-cuda.so
15:28:23 | WARNING | model library /data/models/mlc/dist/VILA-2.7b-ctx256/VILA-2.7b-q4f16_ft/VILA-2.7b-q4f16_ft-cuda.so was missing metadata
15:28:24 | INFO | loading clip vision model openai/clip-vit-large-patch14-336
15:28:28 | INFO | using chat template 'vicuna-v1' for model VILA-2.7b
15:28:28 | INFO | model 'VILA-2.7b', chat template 'vicuna-v1' stop tokens: [''] -> [2]
15:28:28 | INFO | ProcessProxy initialized, output_channels=5
15:28:28 | INFO | subprocess output could not be pickled (<class 'nano_llm.chat.stream.StreamingResponse'>), disabling channel 3
The answer is 4
URI -- missing/invalid IP port from rtsp://admin:LabSys101@10.51.170.55/stream1, default to port 554
(gst-plugin-scanner:84): GLib-GObject-WARNING **: 15:28:30.355: cannot register existing type 'GstRtpSrc'
(gst-plugin-scanner:84): GStreamer-CRITICAL **: 15:28:30.355: gst_element_register: assertion 'g_type_is_a (type, GST_TYPE_ELEMENT)' failed
(Argus) Error FileOperationFailed: Connecting to nvargus-daemon failed: Connection refused (in src/rpc/socket/client/SocketClientDispatch.cpp, function openSocketConnection(), line 204)
(Argus) Error FileOperationFailed: Cannot create camera provider (in src/rpc/socket/client/SocketClientDispatch.cpp, function createCameraProvider(), line 106)
sh: 1: lsmod: not found
sh: 1: modprobe: not found
[gstreamer] initialized gstreamer, version 1.20.3.0
[gstreamer] gstDecoder -- creating decoder for admin
sh: 1: lsmod: not found
sh: 1: modprobe: not found
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NvMMLiteBlockCreate : Block : BlockType = 261
In Live LLaVA, NanoVLM and Nanodb, we use video files as video-input, since we don't have V4L2 USB webcam at the moment. Based on the demo descriptions, seems it supports network stream (like RTSP). We have IPCAMs can stream live rtsp video stream. It's H264, mainstream. VLC can view the stream correctly. We try to use it in the demo, but got error messages. Guess the command line format is wrong? Thanks!
jetson-containers run $(autotag nano_llm) \ python3 -m nano_llm.agents.video_query --api=mlc \ –model Efficient-Large-Model/VILA-2.7b \ –max-context-len 256 \ –max-new-tokens 32 \ –video-input rtsp://admin:labtest1@10.51.170.55/stream1 \ –video-output webrtc://@:8554/output \ –nanodb /data/nanodb/coco/2017 \
Namespace(packages=['nano_llm'], prefer=['local', 'registry', 'build'], disable=[''], user='dustynv', output='/tmp/autotag', quiet=False, verbose=False) -- L4T_VERSION=36.2.0 JETPACK_VERSION=6.0 CUDA_VERSION=12.2 -- Finding compatible container image for ['nano_llm']
dustynv/nano_llm:r36.2.0
TRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. UseHF_HOME
instead. warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: UsingTRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. UseHF_HOME
instead. warnings.warn( Fetching 10 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 50051.36it/s] Fetching 12 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 60133.39it/s] 15:28:21 | INFO | loading /data/models/huggingface/models--Efficient-Large-Model--VILA-2.7b/snapshots/2ed82105eefd5926cccb46af9e71b0ca77f12704 with MLC You setadd_prefix_space
. The tokenizer needs to be converted from the slow tokenizers 15:28:23 | INFO | device=cuda(0), name=Orin, compute=8.7, max_clocks=918000, multiprocessors=8, max_thread_dims=[1024, 1024, 64], api_version=12020, driver_version=None 15:28:23 | INFO | loading VILA-2.7b from /data/models/mlc/dist/VILA-2.7b-ctx256/VILA-2.7b-q4f16_ft/VILA-2.7b-q4f16_ft-cuda.so 15:28:23 | WARNING | model library /data/models/mlc/dist/VILA-2.7b-ctx256/VILA-2.7b-q4f16_ft/VILA-2.7b-q4f16_ft-cuda.so was missing metadata 15:28:24 | INFO | loading clip vision model openai/clip-vit-large-patch14-336===SKIP some logs in the middle===
│ name │ VILA-2.7b │ ├────────────────────────────┼─────────────────────────────────────────────────────────┤ │ api │ mlc │ ├────────────────────────────┼─────────────────────────────────────────────────────────┤ │ quant │ q4f16_ft │ ├────────────────────────────┼─────────────────────────────────────────────────────────┤ │ type │ llama │ ├────────────────────────────┼─────────────────────────────────────────────────────────┤ │ max_length │ 256 │ ├────────────────────────────┼─────────────────────────────────────────────────────────┤ │ prefill_chunk_size │ -1 │ ├────────────────────────────┼─────────────────────────────────────────────────────────┤ │ load_time │ 6.525660961000085 │ ├────────────────────────────┼─────────────────────────────────────────────────────────┤ │ params_size │ 1300.8330078125 │ └────────────────────────────┴─────────────────────────────────────────────────────────┘
15:28:28 | INFO | using chat template 'vicuna-v1' for model VILA-2.7b 15:28:28 | INFO | model 'VILA-2.7b', chat template 'vicuna-v1' stop tokens: [''] -> [2] 15:28:28 | INFO | ProcessProxy initialized, output_channels=5 15:28:28 | INFO | subprocess output could not be pickled (<class 'nano_llm.chat.stream.StreamingResponse'>), disabling channel 3
The answer is 4 URI -- missing/invalid IP port from rtsp://admin:LabSys101@10.51.170.55/stream1, default to port 554
(gst-plugin-scanner:84): GLib-GObject-WARNING **: 15:28:30.355: cannot register existing type 'GstRtpSrc'
(gst-plugin-scanner:84): GLib-GObject-CRITICAL **: 15:28:30.355: g_type_add_interface_static: assertion 'G_TYPE_IS_INSTANTIATABLE (instance_type)' failed
(gst-plugin-scanner:84): GLib-CRITICAL **: 15:28:30.355: g_once_init_leave: assertion 'result != 0' failed
(gst-plugin-scanner:84): GStreamer-CRITICAL **: 15:28:30.355: gst_element_register: assertion 'g_type_is_a (type, GST_TYPE_ELEMENT)' failed
(gst-plugin-scanner:84): GLib-GObject-WARNING **: 15:28:30.355: cannot register existing type 'GstRtpSink'
(gst-plugin-scanner:84): GLib-GObject-CRITICAL **: 15:28:30.355: g_type_add_interface_static: assertion 'G_TYPE_IS_INSTANTIATABLE (instance_type)' failed
(gst-plugin-scanner:84): GLib-CRITICAL **: 15:28:30.355: g_once_init_leave: assertion 'result != 0' failed
(gst-plugin-scanner:84): GStreamer-CRITICAL **: 15:28:30.355: gst_element_register: assertion 'g_type_is_a (type, GST_TYPE_ELEMENT)' failed (Argus) Error FileOperationFailed: Connecting to nvargus-daemon failed: Connection refused (in src/rpc/socket/client/SocketClientDispatch.cpp, function openSocketConnection(), line 204) (Argus) Error FileOperationFailed: Cannot create camera provider (in src/rpc/socket/client/SocketClientDispatch.cpp, function createCameraProvider(), line 106) sh: 1: lsmod: not found sh: 1: modprobe: not found [gstreamer] initialized gstreamer, version 1.20.3.0 [gstreamer] gstDecoder -- creating decoder for admin sh: 1: lsmod: not found sh: 1: modprobe: not found Opening in BLOCKING MODE NvMMLiteOpen : Block : BlockType = 261 NvMMLiteBlockCreate : Block : BlockType = 261
(python3:1): GStreamer-CRITICAL **: 15:28:30.888: gst_debug_log_valist: assertion 'category != NULL' failed
(python3:1): GStreamer-CRITICAL **: 15:28:30.888: gst_debug_log_valist: assertion 'category != NULL' failed
(python3:1): GStreamer-CRITICAL **: 15:28:30.888: gst_debug_log_valist: assertion 'category != NULL' failed
(python3:1): GStreamer-CRITICAL : 15:28:30.888: gst_debug_log_valist: assertion 'category != NULL' failed [gstreamer] gstDecoder -- failed to discover stream info [gstreamer] gstDecoder -- resource discovery and auto-negotiation failed [gstreamer] gstDecoder -- try manually setting the codec with the --input-codec option [gstreamer] gstDecoder -- failed to create decoder for rtsp://admin:LabSys101@10.51.170.55/stream1 Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/NanoLLM/nano_llm/agents/video_query.py", line 358, in
agent = VideoQuery( vars(args)).run()
File "/opt/NanoLLM/nano_llm/agents/video_query.py", line 59, in init
self.video_source = VideoSource(**kwargs) #: The video source plugin
File "/opt/NanoLLM/nano_llm/plugins/video.py", line 52, in init
self.stream = videoSource(video_input, options=options)
Exception: jetson.utils -- failed to create videoSource device