Freescale / gstreamer-imx

GStreamer 1.0 plugins for i.MX platforms
Other
184 stars 127 forks source link

imxvpudec_h264 incompatible with glupload? #316

Closed Talkless closed 1 year ago

Talkless commented 1 year ago

Hi,

I'm trying to port our Qt application that display RTSP stream using qmlglsink to some imx8mm device.

So far I've managed to cross-compile GStreamer 1.22.3 with gstreamer-imx 2.0.0, libimxdmabuffer 1.0.1, libimxvpuapi2/2.1.2" (can't use more recent libimx* due to https://github.com/Freescale/libimxdmabuffer/issues/7), but either I get caps not accepted:

0:00:08.390234700 10262 0xffff58000f40 WARN                GST_CAPS gstpad.c:5787:pre_eventfunc_check:<queue1:sink> caps video/x-h264, stream-format=(string)byte-stream, alignment=(string)au, width=(int)704, height=(int)576, framerate=(fraction)0/1, coded-picture-structure=(string)frame, chroma-format=(string)4:2:0, bit-depth-luma=(uint)8, bit-depth-chroma=(uint)8, colorimetry=(string)1:3:5:1, parsed=(boolean)true, profile=(string)high, level=(string)3 not accepted

for pipeline:

rtspsrc location=rtsp://... protocols=udp latency=100 ! queue name=q1 ! rtph264depay ! h264parse ! queue ! imxvpudec_h264 ! glupload ! glcolorconvert ! qmlglsink

Or I do get video displayed, but just barely 4fps if I make imx + glupload link using imxg2dvideotransform like this:

rtspsrc location=... protocols=udp latency=100 ! queue name=q1 ! rtph264depay ! h264parse ! queue ! imxvpudec_h264 ! imxg2dvideotransform ! queue ! glupload ! glcolorconvert ! qmlglsink

Does this experiment proves that imxvpudec_h264 incompatible with glpupload? Have anyone though how it could be made possible to use glupload with some minimal overhead?

Talkless commented 1 year ago

Strangely, fps DOES manage to raise up to target 25fps, but only after 10 or so seconds!? FPS is not stable though, and CPU usages is ~70% instead of ~30% using vpudec with older GStreamer 1.18 available in the system/toolchain. Sometimes it keeps stuck at 9 or 14fps..

Any ideas why FPS keeps 6-10 fps for some seconds, and then raises?

All I see is:

0:00:23.252891573  1602 0xaaaae554d700 WARN            videodecoder gstvideodecoder.c:3668:gst_video_decoder_clip_and_push_buf:<imxvpudech264-1> Dropping frame due to QoS. start:0:00:15.028354956 deadline:0:00:15.028354956 earliest_time:0:00:15.047418051
0:00:23.336191154  1602 0xaaaae554d700 WARN            videodecoder gstvideodecoder.c:3668:gst_video_decoder_clip_and_push_buf:<imxvpudech264-1> Dropping frame due to QoS. start:0:00:15.108372174 deadline:0:00:15.108372174 earliest_time:0:00:15.134527086
Talkless commented 1 year ago

@dv1 Do you believe it would be "doable" to make imxvpudec_h264 to work with glupload without expensive CPU copies? Could you price that work if company where I work would suggest to sponsor that work?

Talkless commented 1 year ago

I've discovered viv-fb option for gstreamer-plugins-base!

See: https://github.com/GStreamer/gst-plugins-base/blob/ce937bcb21412d7b3539a2da0509cc96260562f8/gst-libs/gst/gl/meson.build#L277

After enabling viv-fb while cross-building gstreamer-plugins-base I no longer need imxg2dvideotransform. But performance is still poor - high CPU usages and ~10fs cap for the first ~10 seconds is still there.

Talkless commented 1 year ago

Settings qmlglsink sync=0 helped a bit, now it shows 25fps from the start.

CPU usages is still near 70%, but maybe that's unavoidable overhead for using qmlglsink?

The only issue now is that there's frame stuttering every ~2s.

Currently pipeline looks line this:

rtspsrc location=rtsp://... protocols=udp latency=100 buffer-mode=slave ! queue max-size-buffers=0 ! rtph264depay ! queue max-size-buffers=0 ! h264parse ! queue max-size-buffers=0 ! imxvpudec_h264 ! queue max-size-buffers=0  ! glupload ! glcolorconvert ! qmlglsink sync=0
Talkless commented 1 year ago

I had to increace rtspsrc latency to 200ms (can do 100 on PC with VA-API hw decoding). Now I get stable 25fps from the start of the stream. Only "issue" is higher cpu usage compared to launching in terminal gst-lanch-1.0 with glimagesink, but I guess it's just qmlglsink overhead.

So to wrap:

  1. Build gst-plugins-base with viv-fb enabled.
  2. Fiddle with pipeline (play with sync=0/1, latencies, queues, etc.) to find "best" solution.

My current pipeline is:

rtspsrc location=rtsp://... protocols=udp latency=200 buffer-mode=slave ! queue max-size-buffers=0 ! rtph264depay ! queue max-size-buffers=0 ! h264parse ! queue max-size-buffers=0 ! imxvpudec_h264 ! queue max-size-buffers=0 ! glupload ! qmlglsink name=qmlglsink sync=1
dv1 commented 1 year ago

Sorry for the silence. I am unfortunately still kept busy by other topics. viv-fb is indeed necessary, although the direct dmabuf uploader should work too. I hope I can look at this deeper, since the reported issues are still odd. qmlglsink does have overhead though, that is true. (To be more specific, it is Qt overhead.)

Talkless commented 1 year ago

(To be more specific, it is Qt overhead.)

Yeah, I've noticed that even without video stream playing, my QML application consumes ~70% CPU JUST BY MOVING MOUSE AROUND :| . It's in EGLFS mode. Maybe I have to use Vivante EGL platform plugin from Qt for better performance, I believe some default EGL is used.

dv1 commented 1 year ago

Reopening since I will investigate further if there are ways to improve performance.

@Talkless What do you use as build environment? Yocto? If so, what version?

Talkless commented 1 year ago

@dv1 I was provided toolchain called fsl-imx-wayland-glibc-x86_64-core-image-base-cortexa53-crypto-imx8mmevk-toolchain-5.10-hardknott, so I guess it's Yocto Hardknott-based toolchain. Kernel is 5.10.72-lts-5.10.y+g2a23c0cdbb9b.

dv1 commented 1 year ago

@Talkless I pushed a commit that changes the way G2D is used. On the imx8m plus, serious reduction in imxg2dvideotransform CPU usage can be seen. Can you check how the CPU usage is on your end now? I am not sure if you were using this element in your code.

Also, I just ran a test on an imx8m mini EVK here, and CPU usage is much lower than 70%. However, this is Yocto Kirkstone, with upstream GStreamer (that is, not the NXP fork), and the latest versions of gstreamer-imx and libimxvpuapi. In the logs (by setting GST_DEBUG to *gl*upload*:9) I can see that the GL uploader is using the DirectDmabuf method, which avoids CPU based frame copies. I recommend running your test with that log level set and checking out what upload method is being used.

Talkless commented 1 year ago

I've tried to build gstreamre-imx master, but it fails for me:

In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:62,                                                                                                                                                         
                 from ../src/sys/v4l2video/gstimxv4l2object.c:28:                                                                                                                                                                                                                        
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h:21:8: error: redefinition of ‘struct itimerspec’                                                                                                                                       
   21 | struct itimerspec {                                                                                                                                                                                                                                                              
      |        ^~~~~~~~~~                                                                                                                                                                                                                                                                
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/time.h:48,                                                                                                                                                                    
                 from ../src/sys/v4l2video/gstimxv4l2object.c:25:                                                                                                                                                                                                                        
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/bits/types/struct_itimerspec.h:8:8: note: originally defined here                                                                                                                                   
    8 | struct itimerspec                                                                                                                                                                                                                                                                
      |        ^~~~~~~~~~ 
dv1 commented 1 year ago

@Talkless In sys/v4l2video/gstimxv4l2object.c , line 25, try replacing #include <time.h> with #include <sys/time.h>. Then tell me please if it fixes or doesn't fix the issue for you.

Talkless commented 1 year ago

Also fixed this error in: gstimxv4l2videoformat.c:22:

In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:62,
                 from ../src/sys/v4l2video/gstimxv4l2videoformat.c:22:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h:21:8: error: redefinition of ‘struct itimerspec’
   21 | struct itimerspec {
      |        ^~~~~~~~~~
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/time.h:48,
                 from ../src/sys/v4l2video/gstimxv4l2videoformat.c:21:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/bits/types/struct_itimerspec.h:8:8: note: originally defined here
    8 | struct itimerspec
      |        ^~~~~~~~~~

But this fix makes it even worse error after "fixing" gstimxv4l2amphiondec.c:

In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:62,
                 from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:22:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h:16:8: error: redefinition of ‘struct timeval’
   16 | struct timeval {
      |        ^~~~~~~
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/sys/time.h:25,
                 from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:21:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/bits/types/struct_timeval.h:8:8: note: originally defined here
    8 | struct timeval
      |        ^~~~~~~
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:62,
                 from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:22:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h:21:8: error: redefinition of ‘struct itimerspec’
   21 | struct itimerspec {
      |        ^~~~~~~~~~
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/time.h:48,
                 from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:1,
                 from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:22:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/bits/types/struct_itimerspec.h:8:8: note: originally defined here
    8 | struct itimerspec
      |        ^~~~~~~~~~
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:62,
                 from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:22:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h:26:8: error: redefinition of ‘struct itimerval’
   26 | struct itimerval {
      |        ^~~~~~~~~
In file included from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:21:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/sys/time.h:105:8: note: originally defined here
  105 | struct itimerval
      |        ^~~~~~~~~
dv1 commented 1 year ago

Attach /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h, /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h, and /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/sys/time.h please. Also, double check that the includes are correct - if you see that error in gstimxv4l2amphiondec.c, then gstimxv4l2videoformat.c should not build, since the #include directives in both of these files start off identically.

Talkless commented 1 year ago

Now that you have mentioned vidoedev2.h, I remember I had to patch toolchain so that ffmpeg (or was it something from gstreamer) would build:

readonly VIDEODEV_FILE="${SYSROOT}/usr/include/linux/videodev2.h"
sed -i -e "s|<sys/time.h>|<linux/time.h>|g" "${VIDEODEV_FILE}"

videodev2.h: https://paste.debian.net/1284436/ linux/time.h: https://paste.debian.net/1284437/ sys/time.h: https://paste.debian.net/1284438/

dv1 commented 1 year ago

Hm then it is a toolchain bug. Patch that, then retry. I will switch the includes to sys/time.h regardless though, to match videodev2.h.

Talkless commented 1 year ago

@dv1 videodev2.h is already patched, or do you mean I need to patch something more?

dv1 commented 1 year ago

@Talkless Wait - when you had the build errors, did you try this with the patched or unpatched videodev2.h?

Talkless commented 1 year ago

It was already patched. I needed this patch for some other package quite some time ago.

dv1 commented 1 year ago

This patch seems wrong. linux/time.h is not supposed to be there. Read through this kernel mailing list thread for details.

Talkless commented 1 year ago

I've reverted videodev2.h "fix", now master builds, but 2.1.0 does not:

In file included from ../src/sys/v4l2video/gstimxv4l2object.c:26:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:2357:20: error: field ‘timestamp’ has incomplete type
 2357 |  struct timespec   timestamp;
      |                    ^~~~~~~~~
[39/52] Compiling C object sys/v4l2video/libgstimxv4l2video.so.p/gstimxv4l2videoformat.c.o

I believe timespec was issue with some other software too that I had to "fix" videodev2.h.

Master shows this:

glupload gstglupload.c:2594:gst_gl_upload_perform_with_buffer:<glupload1> uploader DirectDmabuf returned 1, buffer: 0xffff2c05db40

If I patch videodev2.h again, to make 2.1.0 build, I see the same:

glupload gstglupload.c:2594:gst_gl_upload_perform_with_buffer:<glupload1> uploader DirectDmabuf returned 1, buffer: 0xffff380777e0
Talkless commented 1 year ago

I get same performance with 2.1.0 and master. I believe CPU usage is just some Qt overhead, as I discovered that simply moving mouse around in application without video playback I get ~70% of CPU usage... And extra element imxg2dvideotransform is not needed if I build gstreamer-plugins-base with viv_fb support enabled.

Maybe I could use one of the gstreamer profiling/tracing utilities to measure actual cost of elements?

Talkless commented 1 year ago

Thread with imxvpudec consumes ~21% CPU, maybe that number you where referring to?

Thread 0xaaaad670ac00 Statistics:
  Time: 0:00:01.653421500
  Avg CPU load: 21.2 %
  Pad Statistics:
    > queue4.src                    : buffers    4609 (live     0,dec     0,dis     1,res     0,cor     0,mar     0,hdr     0,gap     0,drop     0,dlt     0), size (min/avg/max)      17/   1083/   1452, time 0:00:09.785557548, bytes/sec 510093.265051
    > rtph264depay1.src             : buffers     248 (live     0,dec     0,dis     1,res     0,cor     0,mar   248,hdr     0,gap     0,drop     0,dlt   238), size (min/avg/max)   15306/  25808/ 143436, time 0:00:09.656997050, bytes/sec 662771.663578
    > queue5.src                    : buffers     248 (live     0,dec     0,dis     1,res     0,cor     0,mar   248,hdr     0,gap     0,drop     0,dlt   238), size (min/avg/max)   15306/  25808/ 143436, time 0:00:09.656350571, bytes/sec 662816.035203
    > h264parse1.src                : buffers     248 (live     0,dec     0,dis     1,res     0,cor     0,mar   248,hdr    10,gap     0,drop     0,dlt   238), size (min/avg/max)   15312/  25814/ 143439, time 0:00:09.642270148, bytes/sec 663938.253309
    > queue6.src                    : buffers     248 (live     0,dec     0,dis     1,res     0,cor     0,mar   248,hdr    10,gap     0,drop     0,dlt   238), size (min/avg/max)   15312/  25814/ 143439, time 0:00:09.632929197, bytes/sec 664582.067311
    > imxvpudech264-1.src           : buffers     177 (live     0,dec     0,dis    28,res     0,cor     0,mar     0,hdr     0,gap     0,drop     0,dlt     0), size (min/avg/max) ......./3655712/......., time 0:00:09.500220582, bytes/sec 68110105.277553
    > queue7.src                    : buffers     177 (live     0,dec     0,dis    28,res     0,cor     0,mar     0,hdr     0,gap     0,drop     0,dlt     0), size (min/avg/max) ......./3655712/......., time 0:00:09.499402858, bytes/sec 68115968.305847
    > gluploadelement1.src          : buffers     154 (live     0,dec     0,dis    28,res     0,cor     0,mar     0,hdr     0,gap     0,drop     0,dlt     0), size (min/avg/max) ......./8294400/......., time 0:00:09.497045559, bytes/sec 134498417.646266

So again, I guess we can close this bug...

Bigger issue is how to reduce decoding latency, because I get total ~400ms delay from real world: https://community.nxp.com/t5/i-MX-Processors/How-to-achieve-lowest-latency-while-decoding-RTSP-h264-stream/m-p/1677683#M208271

dv1 commented 1 year ago

I have trouble keeping track of this because your sysroot seems to be really weird. So, I can only give vague recommendations. Your videodev2.h appears to be broken. I strongly recommend you redo this Yocto setup from scratch, ideally based on Kirkstone.

That said, it does not seem to me that this is really a GStreamer issue anymore - not if a test pipeline with glimagesink instead of qmlglsink uses far less CPU%. I'd suspect a Qt or GPU driver issue there.

dv1 commented 1 year ago

Yeah, I guess, though even 21% is higher than what I saw I think (I can't check right now).

Agreed, let's close this. The issue does not seem gstreamer-imx specific, but instead originate somewhere else.

Talkless commented 1 year ago

I strongly recommend you redo this Yocto setup from scratch, ideally based on Kirkstone.

Toolchain is provided by some Chinese panel pc manufacturers :) .

Yes I'd say close this, brecause:

Talkless commented 1 year ago

@dv1 should I create issue about decoding latency? Or it's just what ARM cpus can be expected to provide?

GStreamer tracing shows:

0xaaab043ff2a0.imxvpudech264-1.src: mean=0:00:00.166048537 min=0:00:00.067169229 max=0:00:00.254554744

That's 160-250ms latency..?

On desktop with vah264dec we can get TOTAL of ~220-250ms latency, with this NXP IMX8MM machine 400ms or 300ms with sync=false but with jittering...

dv1 commented 1 year ago

@Talkless Open a new issue, and provide a gst-launch-1.0 command line that reproduces the problem

Talkless commented 1 year ago

Thanks for all your time @dv1 , it's really great to have FOSS option, to be able to build from source for latest GStreamer. System image/toolkit provided by manufacturers only have GStreamer & vpudec 1.18...