Closed madsciencetist closed 1 year ago
First, as a control, to rule out CPU speed differences
Be careful with measuring CPU usage as this is relative to current CPU clock (top
, htop
and similar tools).
To compare different loads you always have to fix the CPU clock speed.
The easiest way on Jetson is jetson_clocks
This is especially important when comparing low loads where CPU governor may lower clock speed which results in higher reported load.
@bmegli yes, I tweaked the nvpmodels to fix the clock speeds equally
Turns out my NVDEC and EMC clocks were still running at different speeds, and my NX had a background process using the memory bus somewhat heavily. After fixing all that, the 1.7x and 5x differences reduced to 1.2x and 2x differences.
Profilers and timers insist that while the differences are still substantial, they come more from non-nvidia functions like memcpy
than they do from nvidia functions, and if there's something that's slowing down the memory bus, it's not surprising that it would affect the nvidia functions too.
So I'm ready to say that this is more likely a HW or OS issue than anything in nvmpi.
@madsciencetist
@bmegli yes, I tweaked the nvpmodels to fix the clock speeds equally
That's even more than I meant but probably even better.
In general measuring CPU/GPU usage requires fixing clock speed (typically to max) so that scaling governor doesn't change the CPU/GPU frequency as reported CPU/GPU usage is relative to running frequency.
Without that:
The same is true for Windows and task manager
I have two test setups, both with the same nvpmodel clock speeds:
WITH_NVUTILS
not defined)WITH_NVUTILS
defined)Unfortunately I can't fill out the test matrix more because it it not easy for me to reflash either one, and ffmpeg crashes on the NX when I manually disable
WITH_NVUTILS
, which is its own bug. So I'm not really sure if the following differences are due to hardware, OS, or use of nvutils, and I'm hoping someone else can confirm or deny some of these numbers.First, as a control, to rule out CPU speed differences, I use the software decoder, and show that the NX does not use more CPU:
ffmpeg -rtsp_transport tcp -i rtsp://my_2592x1944_hevc_stream -f rawvideo -pix_fmt yuv420p pipe: > /dev/null
Switching to the NVMPI decoder, we see that the NX is using substantially more CPU:
ffmpeg -c:v hevc_nvmpi -rtsp_transport tcp -i rtsp://my_2592x1944_hevc_stream -f rawvideo -pix_fmt yuv420p pipe: > /dev/null
This 1.7x difference grows to 5x as we speed up other steps:
ffmpeg -c:v hevc_nvmpi -resize 320x320 -rtsp_transport tcp -i rtsp://my_2592x1944_hevc_stream -f rawvideo -pix_fmt yuv420p pipe: > /dev/null
Does
NvUtils
use 5x the CPU ofnvbuf_utils
? Does Jetpack 5.0 use 5x the CPU of Jetpack 4.6? Profiling both did not give me any clear answers. Is anyone able to test Jetpack 5.0 withnvbuf_utils
? Trying to figure out if this is a me problem, nvmpi problem or nvidia problem.