dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
1.9k stars 416 forks source link

Live Llava video not displaying in browser #441

Open bill-web7 opened 3 months ago

bill-web7 commented 3 months ago

The terminal output works great, describing the scene well. When I open the browser no video shows up at all. I tried firefox and chromium. I checked the webrtc flags and look right. I searched and tried a million things but I'm just stuck. Thanks in advance for any help.

The only sort of errors I can find are the following:

The answer is 4

(gst-plugin-scanner:106): GLib-GObject-WARNING **: 00:31:33.599: cannot register existing type 'GstRtpSrc'

(gst-plugin-scanner:106): GLib-GObject-CRITICAL **: 00:31:33.599: g_type_add_interface_static: assertion 'G_TYPE_IS_INSTANTIATABLE (instance_type)' failed

(gst-plugin-scanner:106): GLib-CRITICAL **: 00:31:33.599: g_once_init_leave: assertion 'result != 0' failed

(gst-plugin-scanner:106): GStreamer-CRITICAL **: 00:31:33.599: gst_element_register: assertion 'g_type_is_a (type, GST_TYPE_ELEMENT)' failed

(gst-plugin-scanner:106): GLib-GObject-WARNING **: 00:31:33.599: cannot register existing type 'GstRtpSink'

(gst-plugin-scanner:106): GLib-GObject-CRITICAL **: 00:31:33.599: g_type_add_interface_static: assertion 'G_TYPE_IS_INSTANTIATABLE (instance_type)' failed

(gst-plugin-scanner:106): GLib-CRITICAL **: 00:31:33.599: g_once_init_leave: assertion 'result != 0' failed

(gst-plugin-scanner:106): GStreamer-CRITICAL **: 00:31:33.599: gst_element_register: assertion 'g_type_is_a (type, GST_TYPE_ELEMENT)' failed sh: 1: lsmod: not found sh: 1: modprobe: not found

AND

-- sslKey /etc/ssl/private/localhost.key.pem

[gstreamer] gstEncoder -- codec not specified, defaulting to H.264 failed to find/open file /proc/device-tree/model [gstreamer] gstEncoder -- detected board 'NVIDIA Jetson AGX Orin Developer Kit'

dusty-nv commented 3 months ago

Hi @bill-web7, sorry about that, WebRTC can be tricky to get working sometimes. Those errors/warnings look like normal output. If you inspect the browser log (Ctrl+Shift+I in Chrome) are there any errors there?

If you check the WebRTC debug page at https://JETSON_IP:8554 , does it display? In the container, try just running video-viewer.py /dev/video0 webrtc://@:8554 to test the camera feed only

It would also be interesting to know if display is attached to Jetson and you navigate to https://localhost:8554 does it work? Chrome flags #enable-webrtc-hide-local-ips-with-mdns should be disabled

bill-web7 commented 3 months ago

Thanks for the quick reply.

Firefox console errors: image

When I go directly to https://192.168.0.225:8554 I get connection was reset.

running video-viewer.py /dev/video0 webrtc://@:8554 I got command not found until I found it down in ~/jetson-inference/utils/python/examples Then got ./video-viewer.py /dev/video0 webrtc://@:8554 Traceback (most recent call last): File "/home/bbares/jetson-inference/utils/python/examples/./video-viewer.py", line 27, in from jetson_utils import videoSource, videoOutput, Log

Seems pretty clear I missed a step somewhere. I did go directly to https://www.jetson-ai-lab.com/tutorial_llava.html once I had jetpack 6 installed. Once I got to live-llava this video output problem came up. I did try and start from the beginning of the tutorials but I must have missed something.

The Jetson is attached directly to a display port monitor. http://localhost:8554/ gives connection was reset.

bill-web7 commented 3 months ago

@dusty-nv I was able to get video output by changing to --video-output display://0

That opens another window so I can see the video and the result text on the video as expected. However I can't change the prompt string using the web page at :8050. I can use the interface but nothing changes. I

I can add a prompt by adding --prompt "How many unique faces do you see" but I can't find a way to remove the default prompt "Describe the image concisely". Where would I find that to change/remove it?

I think the fact that video-viewer is not found must mean I missed some step. It would be helpful to me if, starting from a clean jetpack 6/ deepstream install on my Jetson agx orin 64m dev kit, there was a list of the steps to take to get live llm running.

dusty-nv commented 3 months ago

@bill-web7 I remember having WebRTC issues with Firefox, can you try Chrome or Chromium? And set chrome://flags#enable-webrtc-hide-local-ips-with-mdns to Disabled first (and restart browser after changing)

bill-web7 commented 3 months ago

@dusty-nv I had previously tried chromium with the same result. I will try again once I complete reflashing of the jetson. The reason I am reflashing is that I let ubuntu's software updater run. It seems to have killed just about all the nvidia stuff. This is the second time that happened except that time I ran apt autoremove. That deleted deepstream. I appreciate that jetpack 6 is still a developer preview.

bill-web7 commented 3 months ago

I finally got it to work after a completely new install of jetpack 6 and deepstream using sdkmanager Must use a real ubuntu 22 machine. WSL2 and VMware do not work. Device is a Jetson AGX Orin 64 Mb dev kit with a 512 GB nvme. // The following are the exact steps copied from the terminal history with only comments added.

sudo apt install chromium-browser // Set chrome://flags#enable-webrtc-hide-local-ips-with-mdns to disabled and restart browser

// https://github.com/dusty-nv/jetson-containers/blob/master/README.md#getting-started sudo apt-get update && sudo apt-get install git python3-pip

git clone https://github.com/dusty-nv/jetson-containers

cd jetson-containers

pip3 install -r requirements.txt

./run.sh $(./autotag l4t-pytorch) // ran with no issues

// First try of Live Llava - Web page loaded but no video as before - console text correctly described the live video - browser console showed no errors ./run.sh $(./autotag local_llm) python3 -m local_llm.agents.video_query --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-context-len 768 --max-new-tokens 32 --video-input /dev/video0 --video-output webrtc://@:8554/output

// https://www.jetson-ai-lab.com/tutorial_llava.html // Download Model ./run.sh --workdir=/opt/text-generation-webui $(./autotag text-generation-webui) python3 download-model.py --output=/data/models/text-generation-webui TheBloke/llava-v1.5-13B-GPTQ

// Start Web UI with Multimodal Extension ./run.sh --workdir=/opt/text-generation-webui $(./autotag text-generation-webui) python3 server.py --listen --model-dir /data/models/text-generation-webui --model TheBloke_llava-v1.5-13B-GPTQ --multimodal-pipeline llava-v1.5-13b --loader autogptq --disable_exllama --verbose

// llava-v1.5-7b ./run.sh $(./autotag llava) python3 -m llava.serve.cli --model-path liuhaotian/llava-v1.5-7b --image-file /data/images/hoover.jpg

// llava-v1.5-13b ./run.sh $(./autotag llava) python3 -m llava.serve.cli --model-path liuhaotian/llava-v1.5-13b --image-file /data/images/hoover.jpg

// Quantized GGUF models with llama.cpp ./run.sh --workdir=/opt/llama.cpp/bin $(./autotag llama_cpp:gguf) /bin/bash -c './llava-cli \ --model $(huggingface-downloader mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf) \ --mmproj $(huggingface-downloader mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf) \ --n-gpu-layers 999 \ --image /data/images/hoover.jpg \ --prompt "What does the sign say"'

./run.sh --workdir=/opt/llama.cpp/bin $(./autotag llama_cpp:gguf) /bin/bash -c './llava-cli \ --model $(huggingface-downloader mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf) \ --mmproj $(huggingface-downloader mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf) \ --n-gpu-layers 999 \ --image /data/images/lake.jpg'

// https://www.jetson-ai-lab.com/tutorial_nano-vlm.html // Multimodal Chat ./run.sh $(./autotag local_llm) python3 -m local_llm --api=mlc --model liuhaotian/llava-v1.6-vicuna-7b --max-context-len 768 --max-new-tokens 128

// Live Streaming // Try Live Llava again and video works this time! ./run.sh $(./autotag local_llm) python3 -m local_llm.agents.video_query --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-context-len 768 --max-new-tokens 32 --video-input /dev/video0 --video-output webrtc://@:8554/output

I hope this helps for anyone stuck that same way I was.