Open lpkoh opened 3 years ago
Result on csp. Don't think it is a thermal throttling issue as Jetson AGX Xavier is cool to the touch and I have a fan blowing directly at it.
Hi,
I have repeated the experiment. The original one was with a low power setting.
This is my environment:
Other details:
Results:
Another result, this time with Yolov4-csp-512x512 fp16:
I have two questions:
I have re run the result with adac857, thinking it might be due to this issue: https://github.com/ceccocats/tkDNN/issues/226
However, the results have actually slightly worsened, with ~18 fps on yolo4-csp. Can anyone advise?
Hi @lpkoh,
Three considerations:
sudo jetson_clocks
to have the best performance.Finally, yolo4-csp is not Yolov4, it's Scaled Yolo and it is slower, but more precise.
Let me know if you have further questions.
Actually I get very similar results for Yolov4 and Yolov4-csp. These results are obtained on a Xavier AGX, with Jetpack 4.5 with full precision (FP32), selecting only those models in this script .
test | avg ms | min ms | max ms | avg FPS |
---|---|---|---|---|
yolo4_fp32_2 | 47.3199 | 46.5271 | 63.1509 | 21.1328 |
yolo4-csp_fp32_2 | 51.1207 | 50.8716 | 51.8859 | 19.5615 |
Hi thank you for replying on this.
I am confused. You said here and https://github.com/ceccocats/tkDNN/issues/186 and https://github.com/ceccocats/tkDNN/issues/173 that what demo prints on screen is the preprocessing + inference + postprocessing. I thought what the demo prints on screen = demo output, hence I thought that the "only inference fps" on tkDNN was slower than the "only inference fps" on ./trtexec. Can I check where do I find the demo output that corresponds to just inference, no pre/post processing then? I don't find that information here.
Also as I understand tkDNN is a wrapper around tensorrt and cudnn. Does this mean its actually meant to be faster than just running ./trtexec on a jetson board, at least theoretically?
Yeah, you are actually right. In the past the demo was printing also pre/post, but currently it prints the inference time only, so what you see is the inference time. For ./test_rtinference and the script scripts/test_all_tests.sh it's the same.
Yes, tkDNN is just a wrapper of tensorRT and cuDNN. It is just a framework that we use to optimize NN for our projects. It is not because it's faster that we develop it, but to easily port not supported models.
Ah gotcha. So I guess the difference between ~27 fps on yolo4-416x416 vs ~44 in your repo is probably down to MAXN? Could the Tensorrt version difference be an issue? I am using 7, your repo mentions 8. I heard 8 is faster, but for things like transformers, not yolo.
Maybe it's due to MAXN and jetson_clock. Jetpack 4.5 uses TensorRT 7. TensorRT8, that will be supported by tkDNN very soon, is actually slower on Jetson platform for now. We hope NVIDIA will solve the issue with the next minor release.
TensorRT8 is now supported on tensorrt8 branch.
Does TensorRT8 still being slower?
Hi,
I am using an AGX Xavier. I followed the instructions to run the demo for 2d object detection. I built a yolo4_fp16.rt model, which would be a 416x416 model. I then ran ./demo yolo4_fp16.rt, with batch = 1, and received an fps of ~9. This is significantly less than the ~ 41 FPS reported. Images are shown below:
I have no other background processes running. I do not have CUDA_VISIBLE_DEVICES set to anything. My nvpmodel is set to 1 (settings below). I have run sudo jetson_clocks.
I am aware from looking at some of the other issues that this reported fps corresponds to inference only, so unsure why it is so slow (significantly slower than just testing with tensorrt with ./trtexec)