AssertionError in TVM CUDA Initialization

doruksonmez commented 9 months ago

Hi,

I'm just trying test out Live LLaVA using the following command:

./run.sh \
  -e SSL_KEY=/data/key.pem -e SSL_CERT=/data/cert.pem \
  $(./autotag local_llm) \
    python3 -m local_llm.agents.video_query --api=mlc --verbose \
      --model liuhaotian/llava-v1.5-7b \
      --max-new-tokens 32 \
      --video-input /dev/video0 \
      --video-output display://0 \
      --prompt "How many fingers am I holding up?"

However, it throws the following error from local_llm.agents.video_query module about some assertion error:

/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
09:41:03 | DEBUG | Namespace(api='mlc', chat_template=None, debug=True, do_sample=False, log_level='debug', max_new_tokens=32, min_new_tokens=-1, model='liuhaotian/llava-v1.5-7b', prompt=['How many fingers am I holding up?'], quant=None, repetition_penalty=1.0, save_mermaid=None, system_prompt=None, temperature=0.7, top_p=0.95, video_input='v4l2:///dev/video0', video_input_codec=None, video_input_framerate=None, video_input_height=None, video_input_save=None, video_input_width=None, video_output='display://0', video_output_bitrate=None, video_output_codec=None, video_output_save=None, vision_model=None)
09:41:03 | DEBUG | subprocess 694 started
09:41:03 | DEBUG | RUN_PROCESS GIRDI...
09:41:03 | DEBUG | Starting new HTTPS connection (1): huggingface.co:443
09:41:03 | DEBUG | https://huggingface.co:443 "GET /api/models/liuhaotian/llava-v1.5-7b/revision/main HTTP/1.1" 200 2276
Fetching 11 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 71089.90it/s]
09:41:03 | INFO | loading /data/models/huggingface/models--liuhaotian--llava-v1.5-7b/snapshots/12e054b30e8e061f423c7264bc97d4248232e965 with MLC
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/local_llm/local_llm/agents/video_query.py", line 115, in <module>
    agent = VideoQuery(**vars(args)).run() 
  File "/opt/local_llm/local_llm/agents/video_query.py", line 22, in __init__
    self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
  File "/opt/local_llm/local_llm/plugins/process_proxy.py", line 34, in __init__
Traceback (most recent call last):
    raise RuntimeError(f"subprocess has an invalid initialization status ({init_msg['status']})")
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
RuntimeError: subprocess has an invalid initialization status (<class 'AssertionError'>)
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/local_llm/local_llm/plugins/process_proxy.py", line 66, in run_process
    raise error
  File "/opt/local_llm/local_llm/plugins/process_proxy.py", line 63, in run_process
    plugin = factory(**kwargs)
  File "/opt/local_llm/local_llm/agents/video_query.py", line 22, in <lambda>
    self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
  File "/opt/local_llm/local_llm/plugins/chat_query.py", line 63, in __init__
    self.model = LocalLM.from_pretrained(model, **kwargs)
  File "/opt/local_llm/local_llm/local_llm.py", line 72, in from_pretrained
    model = MLCModel(model_path, **kwargs)
  File "/opt/local_llm/local_llm/models/mlc.py", line 58, in __init__
    assert(self.device.exist) # this is needed to initialize CUDA?
AssertionError

What would be the reason for this error? Thanks.

doruksonmez commented 9 months ago

It looks like an issue generated by TVM library under the hood. When I manually started python cli and imported TVM, it throws CUDA_ERROR_NOT_INITIALIZED just when I want to see CUDA device details.

$ python3 
Python 3.8.10 (default, May 26 2023, 14:05:08) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> import tvm
>>> device = tvm.runtime.cuda(0)
>>> print(device.device_name)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/tvm/_ffi/runtime_ctypes.py", line 403, in device_name
    return self._GetDeviceAttr(self.device_type, self.device_id, 5)
  File "/usr/local/lib/python3.8/dist-packages/tvm/_ffi/runtime_ctypes.py", line 303, in _GetDeviceAttr
    return tvm.runtime._ffi_api.GetDeviceAttr(device_type, device_id, attr_id)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "/usr/local/lib/python3.8/dist-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (4) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(TVMFuncCall+0x64) [0xffff7d24af54]
  [bt] (3) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(+0x3537044) [0xffff7d24c044]
  [bt] (2) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::CUDADeviceAPI::GetAttr(DLDevice, tvm::runtime::DeviceAttrKind, tvm::runtime::TVMRetValue*)+0x12e4) [0xffff7d394f3c]
  [bt] (1) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x78) [0xffff7aef6f58]
  [bt] (0) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::Backtrace[abi:cxx11]()+0x30) [0xffff7d29f6f0]
  File "/opt/mlc-llm/3rdparty/tvm/src/runtime/cuda/cuda_device_api.cc", line 72
CUDAError: cuDeviceGetName(&name[0], name.size(), dev.device_id) failed with error: CUDA_ERROR_NOT_INITIALIZED

doruksonmez commented 9 months ago

Further test results:

Extracted the code part which generates the error and placed it in a separate script:

import tvm

target = tvm.target.cuda(arch='sm_87')
device = tvm.runtime.cuda(0)
assert(device.exist)

print(device.device_name)

When I run:

$ python3 test_tvm.py
> Orin

However, it still throws the same error when I run:

$ python3 -m local_llm.agents.video_query --api=mlc --model liuhaotian/llava-v1.5-7b --max-new-tokens 32 --video-input /dev/video0 --video-output display://0 --prompt "How many fingers am I holding up?"

I have also changed the --model value as follows but I don't think it is relevant anyways:

$ python3 -m local_llm.agents.video_query --api=mlc --model /data/models/huggingface/models--liuhaotian--llava-v1.5-7b/snapshots/12e054b30e8e061f423c7264bc97d4248232e965/ --max-new-tokens 32 --video-input /dev/video0 --video-output display://0 --prompt "How many fingers am I holding up?"

/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
12:50:37 | INFO | loading /data/models/huggingface/models--liuhaotian--llava-v1.5-7b/snapshots/12e054b30e8e061f423c7264bc97d4248232e965/ with MLC
12:50:37 | INFO | running MLC quantization:

python3 -m mlc_llm.build --model /data/models/mlc/dist/models//llava-v1.5-7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 4096 --artifact-path /data/models/mlc/dist

Using path "/data/models/mlc/dist/models/llava-v1.5-7b" for model "llava-v1.5-7b"
Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Load cached module from /data/models/mlc/dist/llava-v1.5-7b-q4f16_ft/mod_cache_before_build.pkl and skip tracing. You can use --use-cache=0 to retrace
Finish exporting to /data/models/mlc/dist/llava-v1.5-7b-q4f16_ft/llava-v1.5-7b-q4f16_ft-cuda.so
SET TARGET CUDA
SET RUNTIME CUDA
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/local_llm/local_llm/agents/video_query.py", line 115, in <module>
    agent = VideoQuery(**vars(args)).run() 
  File "/opt/local_llm/local_llm/agents/video_query.py", line 22, in __init__
    self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
  File "/opt/local_llm/local_llm/plugins/process_proxy.py", line 31, in __init__
    raise RuntimeError(f"subprocess has an invalid initialization status ({init_msg['status']})")
RuntimeError: subprocess has an invalid initialization status (<class 'AssertionError'>)
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/local_llm/local_llm/plugins/process_proxy.py", line 62, in run_process
    raise error
  File "/opt/local_llm/local_llm/plugins/process_proxy.py", line 59, in run_process
    plugin = factory(**kwargs)
  File "/opt/local_llm/local_llm/agents/video_query.py", line 22, in <lambda>
    self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
  File "/opt/local_llm/local_llm/plugins/chat_query.py", line 63, in __init__
    self.model = LocalLM.from_pretrained(model, **kwargs)
  File "/opt/local_llm/local_llm/local_llm.py", line 72, in from_pretrained
    model = MLCModel(model_path, **kwargs)
  File "/opt/local_llm/local_llm/models/mlc.py", line 67, in __init__
    assert(self.device.exist) # this is needed to initialize CUDA?
AssertionError

dusty-nv commented 9 months ago

Hi @doruksonmez, are you running on a Jetson Orin device? MLC requires SM_80 or newer

doruksonmez commented 9 months ago

Hi @dusty-nv,

Yes, I'm running on Jetson AGX Orin Dev Kit. I actually specified arch='sm_87' for tvm.target.cuda(arch='sm_87') as well.

I also tried ./build.sh local_llm too for a local build with no luck. It still throws the same error on the local_llm:r35.4.1 image build.

dusty-nv commented 9 months ago

OK gotcha - let's back up a sec and try some more basic usage of MLC to see if you can get that running. Are you able to run any of these?

If not, have you been able to use other GPU stuff in containers on your JetPack build (like in PyTorch, ect). If you are still having problems, I might recommend updating to JetPack 6 to get the latest.

doruksonmez commented 9 months ago

Of course! Here are the results for both tests:

1. MLC-Test

$ ./run.sh $(./autotag mlc)

# python3 -m mlc_llm.build --model Llama-2-7b-hf --quantization q4f16_ft --artifact-path /data/models/mlc/dist --max-seq-len 4096 --target cuda --use-cuda-graph --use-flash-attn-mqa

Using path "/data/models/mlc/dist/models/Llama-2-7b-hf" for model "Llama-2-7b-hf"
Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Load cached module from /data/models/mlc/dist/Llama-2-7b-hf-q4f16_ft/mod_cache_before_build.pkl and skip tracing. You can use --use-cache=0 to retrace
Finish exporting to /data/models/mlc/dist/Llama-2-7b-hf-q4f16_ft/Llama-2-7b-hf-q4f16_ft-cuda.so

I also tested it with llava-v1.5-7b which is the main subject to actual live demo:

# python3 -m mlc_llm.build --model llava-v1.5-7b --quantization q4f16_ft --artifact-path /data/models/mlc/dist --max-seq-len 4096 --target cuda --use-cuda-graph --use-flash-attn-mqa

Using path "/data/models/mlc/dist/models/llava-v1.5-7b" for model "llava-v1.5-7b"
Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Load cached module from /data/models/mlc/dist/llava-v1.5-7b-q4f16_ft/mod_cache_before_build.pkl and skip tracing. You can use --use-cache=0 to retrace
Finish exporting to /data/models/mlc/dist/llava-v1.5-7b-q4f16_ft/llava-v1.5-7b-q4f16_ft-cuda.so

Finally, the benchmark with the Llama-2-7b-hf:

# python3 /opt/mlc-llm/benchmark.py --model /data/models/mlc/dist/Llama-2-7b-hf-q4f16_ft/params --prompt /data/prompts/completion_16.json --max-new-tokens 128 
Namespace(chat=False, max_new_tokens=128, max_num_prompts=None, model='/data/models/mlc/dist/Llama-2-7b-hf-q4f16_ft/params', prompt=['/data/prompts/completion_16.json'], save='', streaming=False)
-- loading /data/models/mlc/dist/Llama-2-7b-hf-q4f16_ft/params

PROMPT:  Once upon a time, there was a little girl who loved to read.

февруари 2015 г.
Another fun read. I'm a huge fan of this author.
A heartwarming story of hope, redemption, and love.
...
AVERAGE OVER 9 RUNS, input=16, output=128
/data/models/mlc/dist/Llama-2-7b-hf-q4f16_ft/params:  prefill_time 0.027 sec, prefill_rate 589.1 tokens/sec, decode_time 2.767 sec, decode_rate 46.3 tokens/sec

Peak memory usage:  654.12 MB

2. Text Chat

I logged in to my HuggingFace account as follows:

$ ./run.sh $(./autotag local_llm)

# huggingface-cli login
...
Token has not been saved to git credential helper.
Your token has been saved to /data/models/huggingface/token
Login successful

# python3 -m local_llm --api=mlc --model=meta-llama/Llama-2-7b-chat-hf --prompt 'hi, how are you?' --prompt 'whats the square root of 900?' --prompt 'whats the previous answer times 4?' --prompt 'can I get a recipie for french onion soup?'
...
Cannot access gated repo for url https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/revision/main.
Access to model meta-llama/Llama-2-7b-chat-hf is restricted and you are not in the authorized list. Visit https://huggingface.co/meta-llama/Llama-2-7b-chat-hf to ask for access.

So, I don't think I have access to that model's repo but earlier today, I was successful on running your other Text and Vision based demos. So whatever the problem is, it should be the way of using MLC or TVM in that particular Live LLaVA demo.

dusty-nv commented 9 months ago

So whatever the problem is, it should be the way of using MLC or TVM in that particular Live LLaVA demo.

OK, interesting - in that Live Llava demo, I had to run MLC/TVM in a subprocess (hence those exceptions about ProcessProxy, which is a wrapper that forwards/receives requests from that subprocess) in order to get everything running smoothly at the same time (like the continuous video stream and VLM simultaneously). I've mostly migrated to JP6 at this point and not tested it on JP5 - I would recommend either disabling that ProcessProxy stuff in the agent (you can mount your local jetson-containers/local_llm tree into the container for easier editing), or trying JetPack 6 on it.

leon-seidel commented 9 months ago

Got the same problem on Orin NX 16 GB with JP 5.1.2 and can't upgrade to JP6 so far, how would you disable the ProcessProxy in video_query.py?

dusty-nv commented 9 months ago

@leon-seidel @doruksonmez try changing this line to the following:

https://github.com/dusty-nv/jetson-containers/blob/2d6187b00eaad34a4a51bf1e088baf4a600faa09/packages/llm/local_llm/agents/video_query.py#L22

self.llm = ChatQuery(model, drop_inputs=True, **kwargs)

And when you start the container, mount your local copy of the code into the container like so:

./run.sh \
  -v /mnt/NVME/jetson-containers/packages/llm/local_llm:/opt/local_llm/local_llm \
  $(./autotag local_llm)

(then any code changes you make to local_llm package will be reflected inside the container without needing to rebuild it)

doruksonmez commented 9 months ago

@dusty-nv I think it is working now but there is an issue related to X display as far as I understand from the logs:

....
13:04:37 | INFO | loading mm_projector weights from /data/models/huggingface/models--liuhaotian--llava-v1.5-7b/snapshots/12e054b30e8e061f423c7264bc97d4248232e965/mm_projector.bin
mm_projector Sequential(
  (0): Linear(in_features=1024, out_features=4096, bias=True)
  (1): GELU(approximate='none')
  (2): Linear(in_features=4096, out_features=4096, bias=True)
)
┌─────────────┬───────────────────┐
│ name        │ llava-v1.5-7b     │
├─────────────┼───────────────────┤
│ api         │ mlc               │
├─────────────┼───────────────────┤
│ quant       │ q4f16_ft          │
├─────────────┼───────────────────┤
│ type        │ llama             │
├─────────────┼───────────────────┤
│ max_length  │ 4096              │
├─────────────┼───────────────────┤
│ vocab_size  │ 32000             │
├─────────────┼───────────────────┤
│ load_time   │ 9.100941032986157 │
├─────────────┼───────────────────┤
│ params_size │ 3232.7265625      │
└─────────────┴───────────────────┘
13:04:37 | INFO | using chat template 'llava-v1' for model llava-v1.5-7b
13:04:37 | DEBUG | connected PrintStream to on_eos on channel=0
13:04:37 | DEBUG | connected ChatQuery to PrintStream on channel=0
13:04:37 | DEBUG | processing chat entry 0  role='system' template='${MESSAGE}\n\n' open_user_prompt=False cached=false text='A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.'
13:04:37 | DEBUG | embedding text (1, 32, 4096) float16 -> ```A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.\n\n```
13:04:37 | DEBUG | processing chat entry 1  role='user' template='USER: ${MESSAGE}\n' open_user_prompt=False cached=false text='What is 2+2?'
13:04:37 | DEBUG | embedding text (1, 11, 4096) float16 -> ```USER: What is 2+2?\n```

2+2 is 4.
...

Model successfully initiliazed but right after that, it fails to open the camera display.

2+2 is 4.
[gstreamer] initialized gstreamer, version 1.16.3.0
[gstreamer] gstCamera -- attempting to create device v4l2:///dev/video0
[gstreamer] gstCamera -- found v4l2 device: C505e HD Webcam
[gstreamer] v4l2-proplist, device.path=(string)/dev/video0, udev-probed=(boolean)false, device.api=(string)v4l2, v4l2.device.driver=(string)uvcvideo, v4l2.device.card=(string)"C505e\ HD\ Webcam", v4l2.device.bus_info=(string)usb-3610000.xhci-4.4, v4l2.device.version=(uint)330360, v4l2.device.capabilities=(uint)2225078273, v4l2.device.device_caps=(uint)69206017;
[gstreamer] gstCamera -- found 38 caps for v4l2 device /dev/video0
[gstreamer] [0] video/x-raw, format=(string)YUY2, width=(int)1280, height=(int)960, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 15/2, 5/1 };
[gstreamer] [1] video/x-raw, format=(string)YUY2, width=(int)1280, height=(int)720, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 15/2, 5/1 };
[gstreamer] [2] video/x-raw, format=(string)YUY2, width=(int)1184, height=(int)656, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 10/1, 5/1 };
[gstreamer] [3] video/x-raw, format=(string)YUY2, width=(int)960, height=(int)720, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 10/1, 5/1 };
[gstreamer] [4] video/x-raw, format=(string)YUY2, width=(int)1024, height=(int)576, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 10/1, 5/1 };
[gstreamer] [5] video/x-raw, format=(string)YUY2, width=(int)960, height=(int)544, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 15/1, 10/1, 5/1 };
[gstreamer] [6] video/x-raw, format=(string)YUY2, width=(int)800, height=(int)600, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [7] video/x-raw, format=(string)YUY2, width=(int)864, height=(int)480, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [8] video/x-raw, format=(string)YUY2, width=(int)800, height=(int)448, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [9] video/x-raw, format=(string)YUY2, width=(int)752, height=(int)416, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [10] video/x-raw, format=(string)YUY2, width=(int)640, height=(int)480, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [11] video/x-raw, format=(string)YUY2, width=(int)640, height=(int)360, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [12] video/x-raw, format=(string)YUY2, width=(int)544, height=(int)288, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [13] video/x-raw, format=(string)YUY2, width=(int)432, height=(int)240, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [14] video/x-raw, format=(string)YUY2, width=(int)352, height=(int)288, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [15] video/x-raw, format=(string)YUY2, width=(int)320, height=(int)240, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [16] video/x-raw, format=(string)YUY2, width=(int)320, height=(int)176, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [17] video/x-raw, format=(string)YUY2, width=(int)176, height=(int)144, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [18] video/x-raw, format=(string)YUY2, width=(int)160, height=(int)120, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [19] image/jpeg, width=(int)1280, height=(int)960, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [20] image/jpeg, width=(int)1280, height=(int)720, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [21] image/jpeg, width=(int)1184, height=(int)656, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [22] image/jpeg, width=(int)960, height=(int)720, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [23] image/jpeg, width=(int)1024, height=(int)576, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [24] image/jpeg, width=(int)960, height=(int)544, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [25] image/jpeg, width=(int)800, height=(int)600, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [26] image/jpeg, width=(int)864, height=(int)480, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [27] image/jpeg, width=(int)800, height=(int)448, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [28] image/jpeg, width=(int)752, height=(int)416, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [29] image/jpeg, width=(int)640, height=(int)480, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [30] image/jpeg, width=(int)640, height=(int)360, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [31] image/jpeg, width=(int)544, height=(int)288, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [32] image/jpeg, width=(int)432, height=(int)240, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [33] image/jpeg, width=(int)352, height=(int)288, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [34] image/jpeg, width=(int)320, height=(int)240, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [35] image/jpeg, width=(int)320, height=(int)176, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [36] image/jpeg, width=(int)176, height=(int)144, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] [37] image/jpeg, width=(int)160, height=(int)120, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 25/1, 20/1, 15/1, 10/1, 5/1 };
[gstreamer] gstCamera -- selected device profile:  codec=MJPEG format=unknown width=1280 height=720 framerate=30
[gstreamer] gstCamera pipeline string:
[gstreamer] v4l2src device=/dev/video0 do-timestamp=true ! image/jpeg, width=(int)1280, height=(int)720, framerate=30/1 ! jpegdec name=decoder ! video/x-raw ! appsink name=mysink sync=false
[gstreamer] gstCamera successfully created device v4l2:///dev/video0
[video]  created gstCamera from v4l2:///dev/video0
------------------------------------------------
gstCamera video options:
------------------------------------------------
  -- URI: v4l2:///dev/video0
     - protocol:  v4l2
     - location:  /dev/video0
  -- deviceType: v4l2
  -- ioType:     input
  -- codec:      MJPEG
  -- codecType:  cpu
  -- width:      1280
  -- height:     720
  -- frameRate:  30
  -- numBuffers: 4
  -- zeroCopy:   true
  -- flipMethod: none
------------------------------------------------
[OpenGL] glDisplay -- X screen 0 resolution:  1920x1080
[OpenGL] glDisplay -- X window resolution:    1920x1080
[OpenGL] failed to create X11 Window.
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/local_llm/local_llm/agents/video_query.py", line 116, in <module>
    agent = VideoQuery(**vars(args)).run() 
  File "/opt/local_llm/local_llm/agents/video_query.py", line 39, in __init__
    self.video_output = VideoOutput(**kwargs)
  File "/opt/local_llm/local_llm/plugins/video.py", line 102, in __init__
    self.stream = videoOutput(video_output, options=options)
Exception: jetson.utils -- failed to create videoOutput device

I actually tested the container if I can get display output from the below test script:

import numpy as np
import cv2 as cv
cap = cv.VideoCapture(0)
if not cap.isOpened():
    print("Cannot open camera")
    exit()
while True:
    # Capture frame-by-frame
    ret, frame = cap.read()
    # if frame is read correctly ret is True
    if not ret:
        print("Can't receive frame (stream end?). Exiting ...")
        break
    # Display the resulting frame
    cv.imshow('frame', frame)
    if cv.waitKey(1) == ord('q'):
        break
# When everything done, release the capture
cap.release()
cv.destroyAllWindows()

This is my command to run the container:

xhost + && sudo docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /mnt/orin/JetsonGenAI/jetson-containers/data:/data -v /mnt/orin/JetsonGenAI/jetson-containers/packages/llm/local_llm:/opt/local_llm/local_llm --device /dev/snd --device /dev/bus/usb -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix --device=/dev/video0 --device=/dev/video1 dustynv/local_llm:r35.3.1

dusty-nv commented 9 months ago

Hi @doruksonmez, are you able to run video-viewer.py /dev/video0 display://0 inside the container?

If so, can you try running this next:

python3 -m local_llm.agents.video_stream \
        --video-input /dev/video0 \
        --video-output display://0

doruksonmez commented 9 months ago

Hi @dusty-nv, sorry for late responses due to time zones.

I'm able to run video-viewer.py /dev/video0 display://0 but the other one results the same.

dusty-nv commented 9 months ago

OK thanks for letting me know @doruksonmez - you are on JetPack 5.1.2 / L4T R35.4.1 right?

doruksonmez commented 9 months ago

Yes, that is correct. I don’t think it would be the cause but also I’m using your Docker image r35.3.1 on it.

TadayukiOkada commented 8 months ago

I'm having the same issue (can not see the video via webrtc or X). However, I was able to work around it using video-viewer. container video output: --video-output rtsp://@:1234/output \ on the host: video-viewer.py rtsp://localhost:1234/output display://0

dusty-nv / jetson-containers

AssertionError in TVM CUDA Initialization #383

1. MLC-Test

2. Text Chat