Freescale / gstreamer-imx

GStreamer 1.0 plugins for i.MX platforms
Other
182 stars 127 forks source link

imxvpudec_jpeg consumes more CPU than jpegdec on i.MX8 #313

Open NIKovachev opened 1 year ago

NIKovachev commented 1 year ago

Hello, imxvpudec_jpeg (83% CPU) consumes more CPU than jpegdec (67% CPU), what could be the reason? I was expecting the opposite.

imxvpudec_jpeg: PID USER PRI NI VIRT RES SHR S CPU%-MEM% TIME+ Command 413 root 20 0 339M 28652 10300 S 83.2 0.7 0:08.56 gst-launch-1.0 v4l2src device=/dev/video0 ! image/jpeg, width=2560, height=1440, framerate=30/1 ! imxvpud

jpegdec: PID USER PRI NI VIRT RES SHR S CPU%-MEM% TIME+ Command 419 root 20 0 318M 22540 6420 S 67.0 0.6 0:06.09 gst-launch-1.0 v4l2src device=/dev/video0 ! image/jpeg, width=2560, height=1440, framerate=30/1 ! jpegdec

NIKovachev commented 1 year ago

@dv1 any suggestions? anything wrong with my setup or this is expected behaviour. do you have any benchmarks?

dv1 commented 1 year ago

This could be because of cache issues. It is worth investigating though. What machine is this exactly? imx8m mini? imx8mq? imx8m plus?

NIKovachev commented 1 year ago

it's i.MX8M Data Sheet: https://coral.ai/docs/dev-board/datasheet/#system-components

NIKovachev commented 1 year ago

hi @dv1 do we have any progress, did you manage to reproduce?

dv1 commented 1 year ago

I tried to replicate this, no luck so far. I attempted this with 2 USB webcams. The Logitech C920 showed slightly higher CPU% with jpegdec compared to imxvpudec_jpeg. The command line:

gst-launch-1.0 v4l2src device=/dev/video4 ! image/jpeg ! queue ! jpegdec ! fakesink sync=true

Replace jpegdec with imxvpudec_jpeg.

I ran this on an imx8mq EVK.

What camera did you use? And what versions of libimxvpuapi and gstreamer-imx are you using?

NIKovachev commented 1 year ago

I tried to replicate this, no luck so far. I attempted this with 2 USB webcams. The Logitech C920 showed slightly higher CPU% with jpegdec compared to imxvpudec_jpeg. The command line:

gst-launch-1.0 v4l2src device=/dev/video4 ! image/jpeg ! queue ! jpegdec ! fakesink sync=true

Replace jpegdec with imxvpudec_jpeg.

I ran this on an imx8mq EVK.

What camera did you use? And what versions of libimxvpuapi and gstreamer-imx are you using?

The key is in the resolution. the higher the resolution is the bigger the performance degradation. The test case reported is with image/jpeg, width=2560, height=1440, framerate=30/1 but image/jpeg, width=3840, height=2160, framerate=30/1 is even worst.

NIKovachev commented 1 year ago

sorry I forgot to mention: libimxvpuapi - 2.2.2 gstreamer-imx - latest version, commit ebbc5d3 on Dec 10, 2022

dv1 commented 1 year ago

And what camera is this? Is it a USB camera? If so, what model? Or is it a camera that is connected through some other means?

NIKovachev commented 1 year ago

It's 4k USB camera Hama c-900 pro: https://pl.hama.com/001399950000/hama-kamera-internetowa-c-900-pro-uhd-4k-usb-c

I'm looking for a way to convert 30fps 4k jpeg into RGB and then apply ML on the images.

NIKovachev commented 1 year ago

Hello @dv1 did you manage to reproduce the issue?

dv1 commented 1 year ago

@NIKovachev I finally got to check this out again.

Since I do not have that webcam, I did this instead:

I created a test 4K MJPEG file with this pipeline:

GST_DEBUG=2 gst-launch-1.0 videotestsrc num-buffers=600 ! videoconvert dither=0 ! "video/x-raw,width=3840,height=2160,format=I420,framerate=30/1" ! queue ! jpegenc quality=70 ! matroskamux ! filesink location=mjpeg-4k-test.mkv

Then I played this on the imx8mq EVK:

GST_DEBUG=2 gst-launch-1.0 filesrc location=mjpeg-4k-test.mkv ! matroskademux ! jpegparse ! imxvpudec_jpeg ! fakesink sync=true

I see CPU usage of about 10% in htop.

Then, with jpegdec:

GST_DEBUG=2 gst-launch-1.0 filesrc location=mjpeg-4k-test.mkv ! matroskademux ! jpegparse ! jpegdec ! fakesink sync=true

This saturates the thread - 100% CPU.

So, in your case, I suspect that it might be USB related, actually. I do not know if USB 3.0 suffers from the same CPU usage problem as USB 2.0 does (that is, the CPU has to parse the USB packets, which is costly when large 4K frames are sent through those packets).

If you can, produce the following:

  1. Create dot dumps by setting the GST_DEBUG_DUMP_DOT_DIR environment variable to /tmp/. Then, collect the .dot files in /tmp/, and attach them here.
  2. Run your pipeline with the GST_DEBUG environment variable set to 2,*imx*:5.