Closed hbiyik closed 1 year ago
maybe try to disable other codecs to make rkmpp the only option for them
Thanks for your input, i will try it but one concern at least, i think both chromium and firefox uses libvpx for vp9 (probably for hevc as well). And opera media decoders are based on chromium. Is forcing the ffmpeg decoder the right choice or may be browsers media decoder interface code should be changed?
if you are using chromium based browser, maybe you can try libv4l-rkmpp + chromium v4l2vda(zero-copy, h264/vp8/vp9 (hevc for r105+))
for libvpx, i think if the browser supports vp9 hw decoding officially for desktop version, they should use ffmpeg or vaapi
Thanks i saw that in the recipes but i found it too invaisive to maintain in long term. Just thought ffmpeg itnerface would be easier to maintain but seems i was wrong.
for libvpx, i think if the browser supports vp9 hw decoding officially for desktop version, they should use ffmpeg or vaapi
Ok thanks for your help, ill do some investigations seems there is no easy way around it. Im closing the issue if you dont mind.
Thanks i saw that in the recipes but i found it too invaisive to maintain in long term. Just thought ffmpeg itnerface would be easier to maintain but seems i was wrong.
Right, libv4l-rkmpp solution depends on chromium's v4l2vda patches, which is hard to maintain.
I'm suggesting it because the ffmpeg way has an extra buffer format conversion(nv12 to i420) and texture importing(basically a memcpy() in mali library)
memcpy(dst_y + i y_pitch, src + i hstride, frame->width);
I think you are referrring above line. I have been using with librga with rk3588 based rock5b and performance is not amazing but pretty OK. U have mentioned somewhere in the issues that librga was broken for rk3588 but it seems to be working fine i guess. As per memory i did not see "Doing slow software conversion" in the debug log.
But again the performance is not great but OK.
right, for rga, the 3588 requires width and height to be 16 aligned, and may only support ddr phy address lower than 4g. so it might still work for some cases. and for broken cases, the librga doesn't even return error code, so only kernel error logs and no software conversion triggered
and for the other memcpy, it's internal GPU texture importing, there's no warning log for it
Thanks now i have a strong feeling that i dont know exactly what im doing :) most likely i did not test good enough yet
A small update: Firefox uses internal FFMpeg fork, so it is not possible to provide rkmpp without FUBARing ffmpeg internal to the FF with patches. Chromium on the other hand can use system ffmpeg but most of the distros dont do it not to be trolled by lawyers of FFMpeg. With a new build flag set it should be possible to directly use ffmpeg rkmpp with chromium. I think i have to recompile chromium and and find the right flags.
i seriously think that there is synchronization issue with rkmmpdec.c or mpp. To validate, i have directly interfaced with /dev/rga instead of using librga and it works. So there is no cpu dependancy anymore. See commit: https://github.com/hbiyik/FFmpeg/commit/313d65b370084d632c1370ac101152e1032dd082
I use 60fps HEC 10bit test file to challange the decoder. Below file: https://drive.google.com/file/d/0BwxFVkl63-lEdVBuZkltckdZZ0k/view?usp=sharing&resourcekey=0-k91iv2m3Plumc5jdKCbxdQ
I am using git version of mpp from develop branch: https://github.com/rockchip-linux/mpp
when i run ffplay -loglevel repeat+level+debug ~/testfiles/sample_3840x2160.hevc
it utilizes 8 threads in the rock pi 5 with 3588, and video stutters and does not play. But when i force to use single thread as below.
ffplay -threads 1 -loglevel repeat+level+debug ~/testfiles/sample_3840x2160.hevc
video play fine. I think somewhere in library call of mpp, it is unlocking some synchronization primitive even though the buffer is full, but this at the end of the day causes races in between decoder threads.
Normally all players use multihreaded decoder, and i think it makes absolutely no sense for rkmppdec.c to run in multithreads since the actual decoder resource is serial. Without any modification to rkmppdec.c is it possible to state that this decoder is single threaded? something like .capabilities of FFCodec?
for thread issue, please report to mpp maintainers :)
for ffmpeg, there's a AV_CODEC_CAP_FRAME_THREADS, but i don't think we enabled that
i haven't saw this kind of issue before
Managed to make with Firefox. It uses FFMpeg always which is good. Also applied the /dev/rga patches from icecream95 (the author of panfork) so its moderately fast. Chromium by default VPXdecoder which is not FFMpeg based so i skip chromium because i do not want to recompile it from scratch. Chromium also does not like the /dev/rga patch it crashes on runtime. But with only your patches it works, terribly slow though due to manual memcopies.
So firefox is a viable option. VP8, VP9, H264 works up to 4k 60fps without moderate performance. See below task manager screenhot. Hevc is not supported by Firefox and AV1 is not supported yet by the rkmppdec.c
However there is one single problem with firefox that is, it can not detect the colorspace of VP8&9 with this decoder. I think decoder needs some profiles. Yet i patched it in a very ugly way as below commit. Now the colors are not weird as in the pictures. https://github.com/hbiyik/FFmpeg/commit/308f59115409ff79b9f8a3e013c052be36f2779a
if you are using 3588, maybe i can try to add av1(something like the other recently added formats)
Yes i am on 3588 and av1 would be great so that i can abusebthe chip with 8k youtube videos :)
1/ for color space, maybe try mpp_frame_get_colorspace() 2/ for av1, please try https://github.com/JeffyCN/FFmpeg/tree/wip/av1
code already probes for it but it returns 0 falsely . and i think there is an optimization point here, there is no need to probe for it for each frame, but better to do it as per init since frame format will not change during decode runtime (EDIT: nevermind about this optimisation comment, i checked the mpp code, get_colorspace and other variants are just only getters to class attributes which are not cpu intensive, so there is no room there.)
I heard that AV1 is not supported by rockchip kernel yet, but anyways i will try it when i get the chance.
Thanks for your help again.
Yeah kernel crashes with AV1:
is it curable?
it works well on my side, could due to old kernel version.
but i've been warned not to provide newest 5.10 kernel before, so...
maybe you can ask mpp's maintainer(herman) for it
thanks, can confirm it works when the av1 decoder is enabled in DTS.
however practically AV1 is used in 8k videos in youtube, and the software conversion is slow it creates lots of latency and causes all frames to be dropped. The RGA is also complaning when it is 8K videos. So practically not usefull, i might need to attach a debugger and see whats going on with the RGA. I have no idea or experience with RGA but may be i can understand what going on.
RGA Errors in dmesg:
[ 5701.251551] rga: Blit mode: request id = 1028
[ 5701.251561] rga_debugger: render_mode = 0, bitblit_mode=0, rotate_mode = 0
[ 5701.251582] rga_debugger: src: y = 1e uv = 0 v = 1fa4000 aw = 7680 ah = 4320 vw = 7680 vh = 4320
[ 5701.251588] rga_debugger: src: xoff = 0, yoff = 0, format = 0xa, rd_mode = 1
[ 5701.251595] rga_debugger: dst: y=0 uv=7f68749000 v=7f6a729000 aw=7680 ah=4320 vw=7680 vh=4352
[ 5701.251601] rga_debugger: dst: xoff = 0, yoff = 0, format = 0xb, rd_mode = 1
[ 5701.251603] rga_debugger: mmu: mmu_flag=80000521 en=1
[ 5701.251608] rga_debugger: alpha: rop_mode = 0
[ 5701.251613] rga_debugger: yuv2rgb mode is 0
[ 5701.251616] rga_debugger: set core = 0, priority = 0, in_fence_fd = 0
[ 5701.251626] rga_policy: start policy on core = 1
[ 5701.251632] rga_policy: core = 1, break on rga_check_dst
[ 5701.251636] rga_policy: start policy on core = 2
[ 5701.251640] rga_policy: core = 2, break on rga_check_dst
[ 5701.251644] rga_policy: start policy on core = 4
[ 5701.251649] rga_policy: core = 4, break on rga_check_dst
[ 5701.251652] rga_policy: optional_cores = 0
[ 5701.251659] rga_policy: invalid function policy
[ 5701.251662] rga_policy: assign core: -1
[ 5701.251667] rga_job: job assign failed
[ 5701.251669] rga_job: failed to get scheduler, rga_job_commit(445)
[ 5701.251683] rga_job: request[1028] finished 0 failed 1
[ 5701.251687] rga_job: request[1028] task[0] job_commit failed.
[ 5701.251693] rga_job: rga request commit failed!
[ 5701.251696] rga: request[1028] submit failed!
i think rga does not support 8k conversion.? I guess not
num of scheduler = 3
===================================
scheduler[0]: rga3_core0
-----------------------------------
pd_ref = 0
scheduler[1]: rga3_core1
-----------------------------------
pd_ref = 0
scheduler[2]: rga2
-----------------------------------
pd_ref = 0
[root@alarm rkrga]# cat driver_version
RGA multicore Device Driver: v1.2.20
[root@alarm rkrga]# cat hardware
===================================
rga3_core0, core 1: version: 3.0.76831
input range: 68x2 ~ 8176x8176
output range: 68x2 ~ 8128x8128
scale limit: 1/8 ~ 8
byte_stride_align: 16
max_byte_stride: 32768
csc: RGB2YUV 0xf YUV2RGB 0xf
feature: 0x4
mmu: RK_IOMMU
-----------------------------------
rga3_core1, core 2: version: 3.0.76831
input range: 68x2 ~ 8176x8176
output range: 68x2 ~ 8128x8128
scale limit: 1/8 ~ 8
byte_stride_align: 16
max_byte_stride: 32768
csc: RGB2YUV 0xf YUV2RGB 0xf
feature: 0x4
mmu: RK_IOMMU
-----------------------------------
rga2, core 4: version: 3.2.63318
input range: 2x2 ~ 8192x8192
output range: 2x2 ~ 4096x4096
scale limit: 1/16 ~ 16
byte_stride_align: 4
max_byte_stride: 32768
csc: RGB2YUV 0x7 YUV2RGB 0x7
feature: 0x5f
mmu: RGA_MMU
-----------------------------------
ah i understand now, the rga conversion outputs yuv420p, and only rga2 supports yuv420p. And rga2 output is limited to 4k. Is there a way to use rga3 output compatible with FFMpeg? I am no expert on those formats so this might be a silly question.
can one of those fun formats be used?
const uint32_t rga3_output_raster_format[] = {
RGA_FORMAT_RGBA_8888,
RGA_FORMAT_BGRA_8888,
RGA_FORMAT_RGB_888,
RGA_FORMAT_BGR_888,
RGA_FORMAT_RGB_565,
RGA_FORMAT_BGR_565,
RGA_FORMAT_YCbCr_422_SP,
RGA_FORMAT_YCbCr_420_SP,
RGA_FORMAT_YCrCb_422_SP,
RGA_FORMAT_YCrCb_420_SP,
RGA_FORMAT_YVYU_422,
RGA_FORMAT_VYUY_422,
RGA_FORMAT_YUYV_422,
RGA_FORMAT_UYVY_422,
RGA_FORMAT_YCbCr_420_SP_10B,
RGA_FORMAT_YCrCb_420_SP_10B,
RGA_FORMAT_YCbCr_422_SP_10B,
RGA_FORMAT_YCrCb_422_SP_10B,
};
UPDATE:
Sorry for spam: I hacked out the the rga output format, RGA_FORMAT_YCbCr_420_SP just for testing, now rga3 is initialized and decoding performance is much way better, however as expected color format is quite messed up since it is expect RGA_FORMAT_YCbCr_420_P not SP.
8k decoding needs A LOT of CMA, i manually provided 1G CMA size in kernel cmd which looked barely enough, however even in this case, the decoding speed is not enough.
Ie expected decode time for 8k 30fps youtube av1 video is 33ms, but even with RGA3 the decode time is arounf 128ms. So an optimization to the rkmppdec.c is a must for 8K things.
[RDD 2852: MediaPDecoder #2]: D/PlatformDecoderModule FFMPEG: Frame decode finished, time 111.30 ms averange decode time 88.41 ms decoded 137 frames
[RDD 2852: MediaPDecoder #2]: D/PlatformDecoderModule FFMPEG: slow decode: failed to decode in time, frame duration 33.33 ms, decode time 111.30
[RDD 2852: MediaPDecoder #2]: D/PlatformDecoderModule FFMPEG: frames: all decoded 137 late decoded 128 over averange 128
[RDD 2852: MediaPDecoder #2]: D/PlatformDecoderModule FFMPEG: Got one frame output with pts=15200000 dts=-9223372036854775808 duration=33333 opaque=-9223372036854775808
.dst = {
.uv_addr = (uintptr_t) dst_y,
.v_addr = (uintptr_t) dst_u,
.format = RGA_FORMAT_YCbCr_420_SP,
Just for your reference, if i directly return frame without any conversion neither with rga nor with software what ever, still the decoding speed is slow, which means rkmppdec.c workflow is slow not the rga.
Here is the output when frame conversion is totally bypassed.
[RDD 21726: MediaPDecoder #2]: D/PlatformDecoderModule FFMPEG: Frame decode finished, time 79.68 ms averange decode time 34.51 ms decoded 179 frames
[RDD 21726: MediaPDecoder #2]: D/PlatformDecoderModule FFMPEG: slow decode: failed to decode in time, frame duration 33.33 ms, decode time 79.68
[RDD 21726: MediaPDecoder #2]: D/PlatformDecoderModule FFMPEG: frames: all decoded 179 late decoded 76 over averange 27
[RDD 21726: MediaPDecoder #2]: D/PlatformDecoderModule FFMPEG: Got one frame output with pts=116333333 dts=-9223372036854775808 duration=33333 opaque=-922337203685477580
1/ i don't like rga at all, since it has lots of limitations and unstable APIs, and the limitations and APIs changed all the time and they don't even consider maintaining the old APIs.
2/ maybe you can try to use GPU(gles) to do the conversion
3/ the browsers are likely using gles to compose, which might cause an extra memcpy when importing frame to texture. unless they have a path to use dma buf(external texture) like chromium vda+v4l-rkmpp
I noticed that only on AV1, when destroying the mpp, i get always a segfault.
Is it mpp issue? here is the full gdb debug traceback. seemslike some mutex has gone wrong.
#0 0x0000007fa7763c74 in pthread_mutex_lock () at /usr/lib/libc.so.6
#1 0x0000007fa609f4b0 in Mutex::lock() (this=<optimized out>) at /usr/src/debug/mpp-git/mpp/osal/inc/mpp_thread.h:127
thd_dec = <optimized out>
notify = 0
__FUNCTION__ = "mpp_dec_notify_normal"
#2 MppMutexCond::lock() (this=<optimized out>) at /usr/src/debug/mpp-git/mpp/osal/inc/mpp_thread.h:210
thd_dec = <optimized out>
notify = 0
__FUNCTION__ = "mpp_dec_notify_normal"
#3 MppThread::lock(MppThreadSignal_e) (id=THREAD_WORK, this=<optimized out>) at /usr/src/debug/mpp-git/mpp/osal/inc/mpp_thread.h:251
thd_dec = <optimized out>
notify = 0
__FUNCTION__ = "mpp_dec_notify_normal"
#4 mpp_dec_notify_normal(MppDecImpl_t*, unsigned int) (dec=0x7f84017bc0, flag=16384) at /usr/src/debug/mpp-git/mpp/mpp/codec/mpp_dec_normal.cpp:1130
thd_dec = <optimized out>
notify = 0
__FUNCTION__ = "mpp_dec_notify_normal"
#5 0x0000007fa60949c4 in mpp_dec_notify(MppDec, RK_U32) (ctx=0x7f84017bc0, flag=16384) at /usr/src/debug/mpp-git/mpp/mpp/codec/mpp_dec.cpp:951
dec = 0x7f84017bc0
ret = MPP_SUCCESS
__FUNCTION__ = "mpp_dec_notify"
#6 0x0000007fa60b6570 in mpp_buf_slot_clr_flag(MppBufSlots, RK_S32, SlotUsageType) (slots=0x7f84017ee0, index=<optimized out>, type=type@entry=SLOT_CODEC_USE) at /usr/src/debug/mpp-git/mpp/mpp/base/mpp_buf_slot.cpp:914
__FUNCTION__ = "mpp_buf_slot_clr_flag"
impl = 0x7f84017ee0
unused = 1
#7 0x0000007fa6119790 in av1d_frame_unref (f=f@entry=0x7f84072948, ctx=0x7f84018b80) at /usr/src/debug/mpp-git/mpp/mpp/codec/dec/av1/av1d_parser.c:465
s = 0x7f84071ba0
__FUNCTION__ = "av1d_frame_unref"
#8 0x0000007fa6119f14 in av1d_parser_deinit (ctx=ctx@entry=0x7f84018b80) at /usr/src/debug/mpp-git/mpp/mpp/codec/dec/av1/av1d_parser.c:734
i = 8
s = 0x7f84071ba0
__FUNCTION__ = "av1d_parser_deinit"
#9 0x0000007fa61191d0 in av1d_deinit (ctx=0x7f84018b80) at /usr/src/debug/mpp-git/mpp/mpp/codec/dec/av1/av1d_api.c:88
buf = <optimized out>
__FUNCTION__ = "av1d_deinit"
av1_ctx = 0x7f84018b80
__FUNCTION__ = "av1d_deinit"
#10 av1d_deinit (ctx=0x7f84018b80) at /usr/src/debug/mpp-git/mpp/mpp/codec/dec/av1/av1d_api.c:82
av1_ctx = 0x7f84018b80
__FUNCTION__ = "av1d_deinit"
#11 0x0000007fa609603c in mpp_parser_deinit(Parser) (prs=0x7f84018b00) at /usr/src/debug/mpp-git/mpp/mpp/codec/mpp_parser.cpp:136
__FUNCTION__ = "mpp_parser_deinit"
p = 0x7f84018b00
#12 0x0000007fa60939b0 in mpp_dec_deinit(MppDec) (ctx=0x7f84017bc0) at /usr/src/debug/mpp-git/mpp/mpp/codec/mpp_dec.cpp:812
i = 11
dec = 0x7f84017bc0
__FUNCTION__ = "mpp_dec_deinit"
#13 0x0000007fa6089dc8 in Mpp::clear() (this=0x7f840037b0) at /usr/src/debug/mpp-git/mpp/mpp/mpp.cpp:270
#14 0x0000007fa608a5b0 in Mpp::~Mpp() (this=<optimized out>, __in_chrg=<optimized out>) at /usr/src/debug/mpp-git/mpp/mpp/mpp.cpp:257
#15 0x0000007fa6090564 in mpp_destroy(MppCtx) (ctx=0x7f84004740) at /usr/src/debug/mpp-git/mpp/mpp/mpi.cpp:497
__FUNCTION__ = "mpp_destroy"
ret = <optimized out>
p = 0x7f84004740
#16 0x0000007fa84338f0 in rkmpp_release_decoder (opaque=<optimized out>, data=0x7f84003640 "@G") at libavcodec/rkmppdec.c:158
--Type <RET> for more, q to quit, c to continue without paging--c
decoder = 0x7f84003640
#17 0x0000007fa7b740a0 in buffer_replace (src=0x0, dst=<optimized out>) at libavutil/buffer.c:133
free_avbuffer = 1
b = 0x7f84007410
#18 av_buffer_unref (buf=<optimized out>) at libavutil/buffer.c:144
#19 av_buffer_unref (buf=<optimized out>) at libavutil/buffer.c:139
#20 0x0000007fa8431de0 in rkmpp_close_decoder (avctx=<optimized out>) at libavcodec/rkmppdec.c:146
rk_context = <optimized out>
decoder = <optimized out>
#21 0x0000007fa7ead69c in avcodec_close (avctx=avctx@entry=0x7f8405b550) at libavcodec/avcodec.c:457
avci = 0x7f84008bd0
i = <optimized out>
#22 0x0000007fa83c6a00 in avcodec_free_context (pavctx=0x7f8e50e570) at libavcodec/options.c:171
avctx = 0x7f8405b550
#23 0x000000558e234414 in decoder_destroy (d=0x7f8e50e5b8) at fftools/ffplay.c:668
ic = <optimized out>
codecpar = 0x7f84001a10
#24 stream_component_close (is=is@entry=0x7f8e50d010, stream_index=0) at fftools/ffplay.c:1225
ic = <optimized out>
codecpar = 0x7f84001a10
#25 0x000000558e234770 in stream_close (is=0x7f8e50d010) at fftools/ffplay.c:1260
#26 0x000000558e23486c in do_exit (is=is@entry=0x7f8e50d010) at fftools/ffplay.c:1290
#27 0x000000558e227b94 in event_loop (cur_stream=<optimized out>) at fftools/ffplay.c:3459
x = <optimized out>
incr = <optimized out>
frac = <optimized out>
event = {type = 256, common = {type = 256, timestamp = 1230}, display = {type = 256, timestamp = 1230, display = 1, event = 0 '\000', padding1 = 0 '\000', padding2 = 0 '\000', padding3 = 0 '\000', data1 = 0}, window = {type = 256, timestamp = 1230, windowID = 1, event = 0 '\000', padding1 = 0 '\000', padding2 = 0 '\000', padding3 = 0 '\000', data1 = 0, data2 = 0}, key = {type = 256, timestamp = 1230, windowID = 1, state = 0 '\000', repeat = 0 '\000', padding2 = 0 '\000', padding3 = 0 '\000', keysym = {scancode = SDL_SCANCODE_UNKNOWN, sym = 0, mod = 14, unused = 0}}, edit = {type = 256, timestamp = 1230, windowID = 1, text = '\000' <repeats 12 times>, "\016\000\000\000\000\000\000\000\240\337\025\332\177\000\000\000\264\327\217\247", start = 127, length = -1054928256}, editExt = {type = 256, timestamp = 1230, windowID = 1, text = 0x0, start = 14, length = 0}, text = {type = 256, timestamp = 1230, windowID = 1, text = '\000' <repeats 12 times>, "\016\000\000\000\000\000\000\000\240\337\025\332\177\000\000\000\264\327\217\247"}, motion = {type = 256, timestamp = 1230, windowID = 1, which = 0, state = 0, x = 0, y = 14, xrel = 0, yrel = -636100704}, button = {type = 256, timestamp = 1230, windowID = 1, which = 0, button = 0 '\000', state = 0 '\000', clicks = 0 '\000', padding1 = 0 '\000', x = 0, y = 14}, wheel = {type = 256, timestamp = 1230, windowID = 1, which = 0, x = 0, y = 0, direction = 14, preciseX = 0, preciseY = -1.05464125e+16, mouseX = 127, mouseY = -1483745356}, jaxis = {type = 256, timestamp = 1230, which = 1, axis = 0 '\000', padding1 = 0 '\000', padding2 = 0 '\000', padding3 = 0 '\000', value = 0, padding4 = 0}, jball = {type = 256, timestamp = 1230, which = 1, ball = 0 '\000', padding1 = 0 '\000', padding2 = 0 '\000', padding3 = 0 '\000', xrel = 0, yrel = 0}, jhat = {type = 256, timestamp = 1230, which = 1, hat = 0 '\000', value = 0 '\000', padding1 = 0 '\000', padding2 = 0 '\000'}, jbutton = {type = 256, timestamp = 1230, which = 1, button = 0 '\000', state = 0 '\000', padding1 = 0 '\000', padding2 = 0 '\000'}, jdevice = {type = 256, timestamp = 1230, which = 1}, jbattery = {type = 256, timestamp = 1230, which = 1, level = SDL_JOYSTICK_POWER_EMPTY}, caxis = {type = 256, timestamp = 1230, which = 1, axis = 0 '\000', padding1 = 0 '\000', padding2 = 0 '\000', padding3 = 0 '\000', value = 0, padding4 = 0}, cbutton = {type = 256, timestamp = 1230, which = 1, button = 0 '\000', state = 0 '\000', padding1 = 0 '\000', padding2 = 0 '\000'}, cdevice = {type = 256, timestamp = 1230, which = 1}, ctouchpad = {type = 256, timestamp = 1230, which = 1, touchpad = 0, finger = 0, x = 0, y = 1.96181785e-44, pressure = 0}, csensor = {type = 256, timestamp = 1230, which = 1, sensor = 0, data = {0, 0, 1.96181785e-44}, timestamp_us = 549119713184}, adevice = {type = 256, timestamp = 1230, which = 1, iscapture = 0 '\000', padding1 = 0 '\000', padding2 = 0 '\000', padding3 = 0 '\000'}, sensor = {type = 256, timestamp = 1230, which = 1, data = {0, 0, 0, 1.96181785e-44, 0, -1.05464125e+16}, timestamp_us = 548272068532}, quit = {type = 256, timestamp = 1230}, user = {type = 256, timestamp = 1230, windowID = 1, code = 0, data1 = 0x0, data2 = 0xe}, syswm = {type = 256, timestamp = 1230, msg = 0x1}, tfinger = {type = 256, timestamp = 1230, touchId = 1, fingerId = 0, x = 1.96181785e-44, y = 0, dx = -1.05464125e+16, dy = 1.77964905e-43, pressure = -3.99243389e-15, windowID = 127}, mgesture = {type = 256, timestamp = 1230, touchId = 1, dTheta = 0, dDist = 0, x = 1.96181785e-44, y = 0, numFingers = 57248, padding = 55829}, dgesture = {type = 256, timestamp = 1230, touchId = 1, gestureId = 0, numFingers = 14, error = 0, x = -1.05464125e+16, y = 1.77964905e-43}, drop = {type = 256, timestamp = 1230, file = 0x1 <error: Cannot access memory at address 0x1>, windowID = 0}, padding = "\000\001\000\000\316\004\000\000\001", '\000' <repeats 15 times>, "\016\000\000\000\000\000\000\000\240\337\025\332\177\000\000\000\264\327\217\247\177\000\000\000\200\022\037\301U\000\000"}
pos = <optimized out>
last_mouse_left_click = 0
flags = <optimized out>
is = <optimized out>
#28 main (argc=<optimized out>, argv=<optimized out>) at fftools/ffplay.c:3753
flags = <optimized out>
is = <optimized out>
Line numbers are a little bit different in my test rkmppdec.c but rest is upstream
looks like a mpp issue, you can try to test it with mpp's tests and gstreamer maybe
@hbiyik Have you found the fix for the mpp_destroy crash? I tested all possible versions i found, still have crashes.
@JeffyCN FYI, I know it is deprecated but v5 has lots of memory leaks with AV1.
i attached a gdb, and noticed that only in AV1 case, rkmpp_dec.c was initializing the MPP twice.
MPP can initialize and destroy on the first case, but on the consecutive destroy after the 2nd initalization, it seg faults. Thats why it is only visible in AV1 case. And the seg fault is a mutex condition deep down in the mpp that i have no idea, and i dont want to have an idea about.
The seg fault may be related to MPP or how rkmmpdec.c is initializing the MPP. I tried several things in terms of configuration of the MPP interface but could not find a solution. So thats beyond my capabilities.
May be above info may help if someone wants to dig in deeper.
PS: did not check with the latest commits from MPP though, now noticed several fixes have been dropped in the 'develop' branch
yep, that fixed the crash AND the leaks.
aha good to hear. unfortunately i disamantled my rock5 so can not test, but good at least you confirmed the issue is resolved.
3/ the browsers are likely using gles to compose, which might cause an extra memcpy when importing frame to texture. unless they have a path to use dma buf(external texture) like chromium vda+v4l-rkmpp
I tried Jeffy's proposed approach, with rockchip's prebuilt chromium and v4l-utils. Natively compiled https://github.com/JeffyCN/libv4l-rkmpp/commit/b004755ccd5410f80efe31716abccd105308e226 but i get 300% CPU usage for AV1 while for H264 i get 80%, looks like sw decoding for AV1.
I am running X11 with glamor, so 80% CPU usage looks fine i suppose (1080p).
Another experiment i tested was with ffmpeg-drm (linked with Jeffys FFmpeg v5, after removing the conversion), i get 20% CPU usage, but still no zero-copy and there is an annoying flicker, from time to time some frames are repeated causing this effect. There is no PTS/DTS control logic, so it plays fast. Maybe you test it in 8K. If @JeffyCN has time and wants to have some fun i push it to github so you could suggest the fix.
Maybe you can check if this works on 8K and make some benchmarks. https://github.com/avafinger/ffmpeg-drm
for chromium with av1, it needs higher chromium version with downstream patches, check: https://github.com/JeffyCN/libv4l-rkmpp/issues/6
Thank you Jeffy, Can you point me to a prebuilt deb package for X11 (debian)?
there's no prebuilt debian version, just newest yocto version :(
I can also confirm av1 crashesare fixed
A small update: Firefox uses internal FFMpeg fork, so it is not possible to provide rkmpp without FUBARing ffmpeg internal to the FF with patches. Chromium on the other hand can use system ffmpeg but most of the distros dont do it not to be trolled by lawyers of FFMpeg. With a new build flag set it should be possible to directly use ffmpeg rkmpp with chromium. I think i have to recompile chromium and and find the right flags.
How to use ffmpeg rkmpp with Chromium
i'd prefer to use custom chromium + libv4l-rkmpp(from yocto meta-browser+meta-rockchip)
Hello
I can get hwdecoder functioning with FFMpeg 4 & 5 with kodi, vlc, and ffplay etc. However dont knwo how to force Chromium Firefox or Opera to use rkmpp based decoders. Is there any previous patches for this or some pragmatic way. Note: rkmpp variant of decoders are already at the most prio in allcodecs.c.