Jetson Nano 2GB - Only CPU Detection Works!

SuperJuke commented 2 years ago

I followed the guide posted here to install Watsor on my Jetson Nano 2GB.

The Jetson Nano 2GB struggles to build the TensorRT model; I therefore followed @asmirnou's advice and used his prebuilt model posted on Google Drive.

Despite having copied the gpu_fp16.buf file to the model/ directory, the detector automatically defaults to CPU. How do I get Watsor to use the TensorRT engine instead of CPU?

This is my metrics output:

{ "cameras": [ { "name": "porch", "fps": { "decoder": 15.1, "sieve": 7.0, "visual_effects": 0.0, "snapshot": 7.0, "mqtt": 7.0 }, "buffer_in": 10, "buffer_out": 0 } ], "detectors": [ { "name": "CPU", "fps": 7.0, "fps_max": 7, "inference_time": 137.2 } ] }

This is the line I used to run Watsor:

python3 -m watsor.main_for_gpu --config config/config.yaml --model-path model/

This is the relevant section from my config.yaml

ffmpeg: decoder:

-hide_banner

-loglevel

error

-nostdin

-fflags

nobuffer

-flags

low_delay

-fflags

+genpts+discardcorrupt

-c:v

h264_nvv4l2dec

-i

-f

rawvideo

-pix_fmt

rgb24

detect:
person: area: 20 # Minimum area of the bounding box an object should have in
order to be detected. Defaults to 10% of entire video resolution.

confidence: 60 # Confidence threshold that a detection is what it's guessed to be,

otherwise it's ruled out. 50% if not set.
car: zones: [1, 3, 5] # Limit the zones on mask image, where detection is allowed.
If not set or empty, all zones are allowed.
                          # Run `zones.py -m mask.png` to figure out a zone number.
truck:
cameras:

porch: # Camera name width: 1280 # height: 720 # Video feed resolution in pixels

input: "rtsp://[redacted]:[redacted]@192.168.xx.xxx:554/cam/realmonitor?channel=1&subtype=0"

detect: # The values below override

person: # detection defaults for just

car: # this camera

My Environment:

Jetson Nano 2GB
Linux jetson 4.9.201-tegra, #R32 (release), REVISION: 5.0
Jetson 4.5
Tensorrt 7.1.3.0-1+cuda10.2

asmirnou commented 2 years ago

Did you rename gpu_fp16.buf to gpu.buf when copying to model directory? Watsor seeks for that file before engaging GPU.

SuperJuke commented 2 years ago

Yes, I did rename gpu_fp16.buf to gpu.buf. Below is what my model directory looks like. I ended up copying cpu.tflite in there because even with the gpu.buf, Watsor was compalinig about the CPU model missing. Btw, thanks for the reply. Really appreciate it.

ls model/ cpu.tflite cpu.zip gpu.buf gpu.uff watsor.log

asmirnou commented 2 years ago

Take a look at the log, if GPU detector fails to load, there should be some error then.

SuperJuke commented 2 years ago

This is from the log. I don't see any error.

2022-10-07 09:42:04,816 MainThread werkzeug INFO : Listening on ('0.0.0.0', 8080) 2022-10-07 09:42:04,818 MainThread root INFO : Starting Watsor on jetson with PID 8890 2022-10-07 09:42:05,404 porch FFmpegDecoder INFO : [h264 @ 0x5584bd6a30] left block unavailable for requested intra mode 2022-10-07 09:42:05,404 porch FFmpegDecoder INFO : [h264 @ 0x5584bd6a30] error while decoding MB 0 19, bytestream 69699 2022-10-07 09:42:07,317 porch FFmpegDecoder INFO : NvMMLiteOpen : Block : BlockType = 261 2022-10-07 09:42:07,318 porch FFmpegDecoder INFO : NVMEDIA: Reading vendor.tegra.display-size : status: 6 2022-10-07 09:42:07,319 porch FFmpegDecoder INFO : NvMMLiteBlockCreate : Block : BlockType = 261 2022-10-07 09:42:13,403 Thread-2 werkzeug INFO : 192.168.31.1 - - [07/Oct/2022 09:42:13] "GET /metrics HTTP/1.1" 200 - 2022-10-07 09:42:15,562 Thread-3 werkzeug INFO : 192.168.31.1 - - [07/Oct/2022 09:42:15] "GET /metrics HTTP/1.1" 200 - 2022-10-07 09:42:20,014 porch werkzeug INFO : 192.168.31.1 - - [07/Oct/2022 09:42:20] "GET /video/mjpeg/porch HTTP/1.1" 200 - 2022-10-07 09:42:43,614 porch werkzeug INFO : 192.168.31.1 - - [07/Oct/2022 09:42:43] "GET /video/mjpeg/porch HTTP/1.1" 200 - 2022-10-07 09:43:18,628 MainThread root INFO : Stopping Watsor

asmirnou commented 2 years ago

if no errors, try to run Python console within Watsor virtual environment and run

import pycuda.driver as cuda
cuda.init()
cuda.Device.count()

is it succeeded?

SuperJuke commented 2 years ago

Ahh. So it seems like it is an issue with pycuda.

import pycuda.driver as cuda Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'pycuda'

I remember the one issue which I had issue with your guide was with the installation of pycuda:

python3 -m pip install pycuda

In the end I ended up using a script I found online but it doesn't seem to have worked.

asmirnou commented 2 years ago

Before running python3 -m pip install pycuda make sure the following variables are defined:

export CPATH=$CPATH:/usr/local/cuda/targets/aarch64-linux/include
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda/targets/aarch64-linux/lib

PyCuda will use them to build the libraries.

May be the paths are no longer correct due to system upgrade, check where are cuda libraries now and change paths accordingly.

SuperJuke commented 2 years ago

Ok. I think I finally got PYCUDA installed using your recommendation above.

cuda.Device.count() now returns '1'.

But now I am getting a new error when I run,

python3 -m watsor.main_for_gpu --config config/config.yaml --model-path model/

It is the same error I got when I tried the Docker container for the Jetson. I'm beginning to think it might be an issue with my base OS. Any suggestions?

PyCUDA ERROR: The context stack was not empty upon module cleanup.

A context was still active when the context stack was being cleaned up. At this point in our execution, CUDA may already have been deinitialized, so there is no way we can finish cleanly. The program will be aborted now. Use Context.pop() to avoid this problem.

watchdog WatchDog WARNING : Process detector1 (ObjectDetector) is not alive, restarting... ^CMainThread root INFO : Stopping Watsor [TensorRT] ERROR: coreReadArchive.cpp (38) - Serialization Error in verifyHeader: 0 (Version tag does not match) [TensorRT] ERROR: INVALID_STATE: std::exception [TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed. detector1 ObjectDetector ERROR : Detection failure Traceback (most recent call last): File "/home/quadcore/venv/lib/python3.6/site-packages/watsor/detection/detector.py", line 92, in _run with detector_class(*detector_args) as object_detector: File "/home/quadcore/venv/lib/python3.6/site-packages/watsor/detection/tensorrt_gpu.py", line 36, in init self._allocate_buffers() File "/home/quadcore/venv/lib/python3.6/site-packages/watsor/detection/tensorrt_gpu.py", line 108, in _allocate_buffers for binding in self.__trt_engine: TypeError: 'NoneType' object is not iterable

SuperJuke commented 2 years ago

After some research last evening, and the fact that I am getting the same error with both the native and docker installation, I'm inclined to believe that the gpu.buf is incompatible with my current Jetson Nano 2GB environment, which may be different to what was used by @asmirnou to generate it. Another possibility is that the Jetson may be running out of memory.

SuperJuke commented 2 years ago

The good news is that I was able to build the Tensorrt engine on the Nano 2GB ... and it only took a couple minutes. The Tensorrt engine is apparently not portable, hence I had to create my own by configuring the Nano 2GB to make full use of a 4GB swapfile.

sudo swapoff -a
sudo swapon -a
sudo sysctl vm.swappiness=100
python3 -u /home/quadcore/venv/lib/python3.6/site-packages/watsor/engine.py -i model/gpu.uff -o model/gpu.buf -p 16

This is my metric:

{ "cameras": [ { "name": "porch", "fps": { "decoder": 16.1, "sieve": 9.5, "visual_effects": 0.0, "snapshot": 9.5, "mqtt": 9.5 }, "buffer_in": 0, "buffer_out": 0 } ], "detectors": [ { "name": "NVIDIA Tegra X1", "fps": 9.5, "fps_max": 17, "inference_time": 57.2 } ] }

Now I have other issues, such as 1) the MJPEG preview in the browser not working. 2) The color of the detected snapshot is all distorted - seems like a problem with the ffmpeg decoder. Will check it out when I get some time. Thanks for the help.

asmirnou commented 2 years ago

Great! Thanks for finding a way to build the Tensorrt engine.

As for issues with preview and color, check that camera resolution matches the config, it may not be 1280x720. ffprobe can print what it is actually.

asmirnou / watsor

Jetson Nano 2GB - Only CPU Detection Works! #29

order to be detected. Defaults to 10% of entire video resolution.

otherwise it's ruled out. 50% if not set.

If not set or empty, all zones are allowed.

PyCUDA ERROR: The context stack was not empty upon module cleanup.

A context was still active when the context stack was being cleaned up. At this point in our execution, CUDA may already have been deinitialized, so there is no way we can finish cleanly. The program will be aborted now. Use Context.pop() to avoid this problem.