Closed Talkless closed 1 year ago
Strangely, fps DOES manage to raise up to target 25fps, but only after 10 or so seconds!? FPS is not stable though, and CPU usages is ~70% instead of ~30% using vpudec
with older GStreamer 1.18 available in the system/toolchain. Sometimes it keeps stuck at 9 or 14fps..
Any ideas why FPS keeps 6-10 fps for some seconds, and then raises?
All I see is:
0:00:23.252891573 1602 0xaaaae554d700 WARN videodecoder gstvideodecoder.c:3668:gst_video_decoder_clip_and_push_buf:<imxvpudech264-1> Dropping frame due to QoS. start:0:00:15.028354956 deadline:0:00:15.028354956 earliest_time:0:00:15.047418051
0:00:23.336191154 1602 0xaaaae554d700 WARN videodecoder gstvideodecoder.c:3668:gst_video_decoder_clip_and_push_buf:<imxvpudech264-1> Dropping frame due to QoS. start:0:00:15.108372174 deadline:0:00:15.108372174 earliest_time:0:00:15.134527086
@dv1 Do you believe it would be "doable" to make imxvpudec_h264
to work with glupload
without expensive CPU copies? Could you price that work if company where I work would suggest to sponsor that work?
I've discovered viv-fb
option for gstreamer-plugins-base
!
After enabling viv-fb
while cross-building gstreamer-plugins-base
I no longer need imxg2dvideotransform
. But performance is still poor - high CPU usages and ~10fs cap for the first ~10 seconds is still there.
Settings qmlglsink sync=0
helped a bit, now it shows 25fps from the start.
CPU usages is still near 70%, but maybe that's unavoidable overhead for using qmlglsink?
The only issue now is that there's frame stuttering every ~2s.
Currently pipeline looks line this:
rtspsrc location=rtsp://... protocols=udp latency=100 buffer-mode=slave ! queue max-size-buffers=0 ! rtph264depay ! queue max-size-buffers=0 ! h264parse ! queue max-size-buffers=0 ! imxvpudec_h264 ! queue max-size-buffers=0 ! glupload ! glcolorconvert ! qmlglsink sync=0
I had to increace rtspsrc
latency to 200ms (can do 100 on PC with VA-API hw decoding). Now I get stable 25fps from the start of the stream. Only "issue" is higher cpu usage compared to launching in terminal gst-lanch-1.0
with glimagesink
, but I guess it's just qmlglsink
overhead.
So to wrap:
gst-plugins-base
with viv-fb
enabled.My current pipeline is:
rtspsrc location=rtsp://... protocols=udp latency=200 buffer-mode=slave ! queue max-size-buffers=0 ! rtph264depay ! queue max-size-buffers=0 ! h264parse ! queue max-size-buffers=0 ! imxvpudec_h264 ! queue max-size-buffers=0 ! glupload ! qmlglsink name=qmlglsink sync=1
Sorry for the silence. I am unfortunately still kept busy by other topics. viv-fb
is indeed necessary, although the direct dmabuf uploader should work too. I hope I can look at this deeper, since the reported issues are still odd. qmlglsink does have overhead though, that is true. (To be more specific, it is Qt overhead.)
(To be more specific, it is Qt overhead.)
Yeah, I've noticed that even without video stream playing, my QML application consumes ~70% CPU JUST BY MOVING MOUSE AROUND :| . It's in EGLFS mode. Maybe I have to use Vivante EGL platform plugin from Qt for better performance, I believe some default EGL is used.
Reopening since I will investigate further if there are ways to improve performance.
@Talkless What do you use as build environment? Yocto? If so, what version?
@dv1 I was provided toolchain called fsl-imx-wayland-glibc-x86_64-core-image-base-cortexa53-crypto-imx8mmevk-toolchain-5.10-hardknott
, so I guess it's Yocto Hardknott-based toolchain. Kernel is 5.10.72-lts-5.10.y+g2a23c0cdbb9b.
@Talkless I pushed a commit that changes the way G2D is used. On the imx8m plus, serious reduction in imxg2dvideotransform
CPU usage can be seen. Can you check how the CPU usage is on your end now? I am not sure if you were using this element in your code.
Also, I just ran a test on an imx8m mini EVK here, and CPU usage is much lower than 70%. However, this is Yocto Kirkstone, with upstream GStreamer (that is, not the NXP fork), and the latest versions of gstreamer-imx and libimxvpuapi. In the logs (by setting GST_DEBUG
to *gl*upload*:9
) I can see that the GL uploader is using the DirectDmabuf
method, which avoids CPU based frame copies. I recommend running your test with that log level set and checking out what upload method is being used.
imxg2dvideotransform
at first, but removed it after I've built GStreamer with viv-fb
support enabled, as it seemed was no longer needed after doing that.gstreamer-imx
and latest libimx* dependencies.I've tried to build gstreamre-imx
master, but it fails for me:
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:62,
from ../src/sys/v4l2video/gstimxv4l2object.c:28:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h:21:8: error: redefinition of ‘struct itimerspec’
21 | struct itimerspec {
| ^~~~~~~~~~
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/time.h:48,
from ../src/sys/v4l2video/gstimxv4l2object.c:25:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/bits/types/struct_itimerspec.h:8:8: note: originally defined here
8 | struct itimerspec
| ^~~~~~~~~~
@Talkless In sys/v4l2video/gstimxv4l2object.c
, line 25, try replacing #include <time.h>
with #include <sys/time.h>
. Then tell me please if it fixes or doesn't fix the issue for you.
Also fixed this error in: gstimxv4l2videoformat.c:22:
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:62,
from ../src/sys/v4l2video/gstimxv4l2videoformat.c:22:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h:21:8: error: redefinition of ‘struct itimerspec’
21 | struct itimerspec {
| ^~~~~~~~~~
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/time.h:48,
from ../src/sys/v4l2video/gstimxv4l2videoformat.c:21:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/bits/types/struct_itimerspec.h:8:8: note: originally defined here
8 | struct itimerspec
| ^~~~~~~~~~
But this fix makes it even worse error after "fixing" gstimxv4l2amphiondec.c
:
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:62,
from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:22:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h:16:8: error: redefinition of ‘struct timeval’
16 | struct timeval {
| ^~~~~~~
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/sys/time.h:25,
from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:21:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/bits/types/struct_timeval.h:8:8: note: originally defined here
8 | struct timeval
| ^~~~~~~
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:62,
from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:22:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h:21:8: error: redefinition of ‘struct itimerspec’
21 | struct itimerspec {
| ^~~~~~~~~~
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/time.h:48,
from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:1,
from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:22:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/bits/types/struct_itimerspec.h:8:8: note: originally defined here
8 | struct itimerspec
| ^~~~~~~~~~
In file included from /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:62,
from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:22:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h:26:8: error: redefinition of ‘struct itimerval’
26 | struct itimerval {
| ^~~~~~~~~
In file included from ../src/sys/v4l2video/gstimxv4l2amphiondec.c:21:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/sys/time.h:105:8: note: originally defined here
105 | struct itimerval
| ^~~~~~~~~
Attach /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h
, /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/time.h
, and /opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/sys/time.h
please. Also, double check that the includes are correct - if you see that error in gstimxv4l2amphiondec.c
, then gstimxv4l2videoformat.c
should not build, since the #include
directives in both of these files start off identically.
Now that you have mentioned vidoedev2.h
, I remember I had to patch toolchain so that ffmpeg (or was it something from gstreamer) would build:
readonly VIDEODEV_FILE="${SYSROOT}/usr/include/linux/videodev2.h"
sed -i -e "s|<sys/time.h>|<linux/time.h>|g" "${VIDEODEV_FILE}"
videodev2.h: https://paste.debian.net/1284436/ linux/time.h: https://paste.debian.net/1284437/ sys/time.h: https://paste.debian.net/1284438/
Hm then it is a toolchain bug. Patch that, then retry. I will switch the includes to sys/time.h
regardless though, to match videodev2.h
.
@dv1 videodev2.h is already patched, or do you mean I need to patch something more?
@Talkless Wait - when you had the build errors, did you try this with the patched or unpatched videodev2.h
?
It was already patched. I needed this patch for some other package quite some time ago.
This patch seems wrong. linux/time.h
is not supposed to be there. Read through this kernel mailing list thread for details.
I've reverted videodev2.h
"fix", now master builds, but 2.1.0 does not:
In file included from ../src/sys/v4l2video/gstimxv4l2object.c:26:
/opt/fsl-imx-wayland/5.10-hardknott/sysroots/cortexa53-crypto-poky-linux/usr/include/linux/videodev2.h:2357:20: error: field ‘timestamp’ has incomplete type
2357 | struct timespec timestamp;
| ^~~~~~~~~
[39/52] Compiling C object sys/v4l2video/libgstimxv4l2video.so.p/gstimxv4l2videoformat.c.o
I believe timespec
was issue with some other software too that I had to "fix" videodev2.h
.
Master shows this:
glupload gstglupload.c:2594:gst_gl_upload_perform_with_buffer:<glupload1> uploader DirectDmabuf returned 1, buffer: 0xffff2c05db40
If I patch videodev2.h
again, to make 2.1.0
build, I see the same:
glupload gstglupload.c:2594:gst_gl_upload_perform_with_buffer:<glupload1> uploader DirectDmabuf returned 1, buffer: 0xffff380777e0
I get same performance with 2.1.0 and master. I believe CPU usage is just some Qt overhead, as I discovered that simply moving mouse around in application without video playback I get ~70% of CPU usage... And extra element imxg2dvideotransform
is not needed if I build gstreamer-plugins-base
with viv_fb
support enabled.
Maybe I could use one of the gstreamer profiling/tracing utilities to measure actual cost of elements?
Thread with imxvpudec
consumes ~21% CPU, maybe that number you where referring to?
Thread 0xaaaad670ac00 Statistics:
Time: 0:00:01.653421500
Avg CPU load: 21.2 %
Pad Statistics:
> queue4.src : buffers 4609 (live 0,dec 0,dis 1,res 0,cor 0,mar 0,hdr 0,gap 0,drop 0,dlt 0), size (min/avg/max) 17/ 1083/ 1452, time 0:00:09.785557548, bytes/sec 510093.265051
> rtph264depay1.src : buffers 248 (live 0,dec 0,dis 1,res 0,cor 0,mar 248,hdr 0,gap 0,drop 0,dlt 238), size (min/avg/max) 15306/ 25808/ 143436, time 0:00:09.656997050, bytes/sec 662771.663578
> queue5.src : buffers 248 (live 0,dec 0,dis 1,res 0,cor 0,mar 248,hdr 0,gap 0,drop 0,dlt 238), size (min/avg/max) 15306/ 25808/ 143436, time 0:00:09.656350571, bytes/sec 662816.035203
> h264parse1.src : buffers 248 (live 0,dec 0,dis 1,res 0,cor 0,mar 248,hdr 10,gap 0,drop 0,dlt 238), size (min/avg/max) 15312/ 25814/ 143439, time 0:00:09.642270148, bytes/sec 663938.253309
> queue6.src : buffers 248 (live 0,dec 0,dis 1,res 0,cor 0,mar 248,hdr 10,gap 0,drop 0,dlt 238), size (min/avg/max) 15312/ 25814/ 143439, time 0:00:09.632929197, bytes/sec 664582.067311
> imxvpudech264-1.src : buffers 177 (live 0,dec 0,dis 28,res 0,cor 0,mar 0,hdr 0,gap 0,drop 0,dlt 0), size (min/avg/max) ......./3655712/......., time 0:00:09.500220582, bytes/sec 68110105.277553
> queue7.src : buffers 177 (live 0,dec 0,dis 28,res 0,cor 0,mar 0,hdr 0,gap 0,drop 0,dlt 0), size (min/avg/max) ......./3655712/......., time 0:00:09.499402858, bytes/sec 68115968.305847
> gluploadelement1.src : buffers 154 (live 0,dec 0,dis 28,res 0,cor 0,mar 0,hdr 0,gap 0,drop 0,dlt 0), size (min/avg/max) ......./8294400/......., time 0:00:09.497045559, bytes/sec 134498417.646266
So again, I guess we can close this bug...
Bigger issue is how to reduce decoding latency, because I get total ~400ms delay from real world: https://community.nxp.com/t5/i-MX-Processors/How-to-achieve-lowest-latency-while-decoding-RTSP-h264-stream/m-p/1677683#M208271
I have trouble keeping track of this because your sysroot seems to be really weird. So, I can only give vague recommendations. Your videodev2.h
appears to be broken. I strongly recommend you redo this Yocto setup from scratch, ideally based on Kirkstone.
That said, it does not seem to me that this is really a GStreamer issue anymore - not if a test pipeline with glimagesink
instead of qmlglsink
uses far less CPU%. I'd suspect a Qt or GPU driver issue there.
Yeah, I guess, though even 21% is higher than what I saw I think (I can't check right now).
Agreed, let's close this. The issue does not seem gstreamer-imx specific, but instead originate somewhere else.
I strongly recommend you redo this Yocto setup from scratch, ideally based on Kirkstone.
Toolchain is provided by some Chinese panel pc manufacturers :) .
Yes I'd say close this, brecause:
imxvpudec
gstreamer-imx
will be tagged, I will be able to use toolchain without workarounds.@dv1 should I create issue about decoding latency? Or it's just what ARM cpus can be expected to provide?
GStreamer tracing shows:
0xaaab043ff2a0.imxvpudech264-1.src: mean=0:00:00.166048537 min=0:00:00.067169229 max=0:00:00.254554744
That's 160-250ms latency..?
On desktop with vah264dec we can get TOTAL of ~220-250ms latency, with this NXP IMX8MM machine 400ms or 300ms with sync=false but with jittering...
@Talkless Open a new issue, and provide a gst-launch-1.0
command line that reproduces the problem
Thanks for all your time @dv1 , it's really great to have FOSS option, to be able to build from source for latest GStreamer. System image/toolkit provided by manufacturers only have GStreamer & vpudec 1.18...
Hi,
I'm trying to port our Qt application that display RTSP stream using qmlglsink to some imx8mm device.
So far I've managed to cross-compile GStreamer 1.22.3 with gstreamer-imx 2.0.0, libimxdmabuffer 1.0.1, libimxvpuapi2/2.1.2" (can't use more recent libimx* due to https://github.com/Freescale/libimxdmabuffer/issues/7), but either I get caps not accepted:
for pipeline:
Or I do get video displayed, but just barely 4fps if I make imx + glupload link using
imxg2dvideotransform
like this:Does this experiment proves that imxvpudec_h264 incompatible with glpupload? Have anyone though how it could be made possible to use
glupload
with some minimal overhead?