facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.15k stars 7.43k forks source link

Inference Speeds on Jetson TX2 #4457

Open frankvp11 opened 2 years ago

frankvp11 commented 2 years ago

I was trying to run Detectron2 as an onnx engine- I first turned Detectron2 into .onnx format, then I turned it into a TensorRT engine, when I then tried to run inference on it it ran what I felt was slowly. I was getting about 1.2s inference speeds (per image) with 480x640 image size. I am wondering if that is normal for a Jetson TX2 with TensorRT optimizations? And Is it possible that I can be faster without the TensorRT optimizations.

Instructions to Reproduce the Issue and Full Logs:

In the TensorRT repo in samples/python/detectron2 run infer.py

Expected behavior:

I expected it to go much faster - like 0.04 seconds as it does on colab

Environment:

Provide your environment information using the following command: I get a syntax error for whatever reason from the command however I'll try my best- Detectron2 built from source, Cuda 10.2, TensorRT 8.2.1, Cudnn- not sure and I'm using a Jetson TX2

github-actions[bot] commented 2 years ago

You've chosen to report an unexpected problem or bug. Unless you already know the root cause of it, please include details about it by filling the issue template. The following information is missing: "Instructions To Reproduce the Issue and Full Logs";

SixK commented 2 years ago

Well, on Jetson devices, machine learning is usually slower than on X86. Remember that Jetson devices ues less than 45W power while a X86 will easily use more than 200W. The major bottleneck is known to be the disk speed. Avoid to use Jetson internal disk chip. Use SSD on SATA, Nvme (not sure TX2 has a nvme slot) or USB SSD (at worst). Put data (and libs ?) on such disks. Same thing if you use docker, put docker images/working directory on fastest disk.

TensorRT can give about 40x boost depending on hardware used. Firsts inferences are usually slower. You have to run 3 or 4 inferences after loading model with fake images before really use your model.

Note also that compiling Python code can give a boost. You can try cx_Freeze, but it may not work with detectron2 code.

frankvp11 commented 2 years ago

Ok I understand. Thanks

frankvp11 commented 2 years ago

@SixK Im going to guess that when I move my files (I have a 256GB SSD) like the python files and whatnot Id have to make symlinks to the original spot?