rga_img_info_t 结构体在 librga 和内核 RGA3 代码定义不一致

jjm2473 commented 1 year ago

librga的定义： https://github.com/JeffyCN/mirrors/blob/a1b05b8fcf4698176477370fd942b31d9ae66404/include/rga.h#L204-L226

RGA3的定义： https://github.com/JeffyCN/mirrors/blob/9789c7416f009b1c7a064241a5f185b368b24732/drivers/video/rockchip/rga3/include/rga.h#L478-L508

RGA3 内核定义多了20字节，加上 rga_img_info_t 在 rga_req 结构体里多次使用，而且不是在 rga_req 末尾，导致 librga 与 RGA3 的 API 和 ABI 都不兼容。 https://github.com/JeffyCN/mirrors/blob/9789c7416f009b1c7a064241a5f185b368b24732/drivers/video/rockchip/rga3/include/rga.h#L510-L519

也就是说，我基于 librga 构建的程序，在使用 RGA3 提供的内核设备时，会出现结构体参数错乱的问题，这问题已经在 RK3568 板子上实际复现过。

请问，librga 是否不兼容 RGA3 驱动？如果不兼容，librga 或者 RGA3 驱动有没有 ABI 兼容的计划？

JeffyCN commented 1 year ago

please contact the rga maintainer(or git commitors)

nyanmisaka commented 1 year ago

@jjm2473 The librga in this mirror is outdated. Newer version in BSP have the same definitions.

https://gitlab.com/rk3588_linux/linux/linux-rga/-/blob/linux-5.10-gen-rkr3.5/include/rga.h#L237

librga (2.2.0-1) unstable; urgency=medium
  * debian: Update to support rga3
 -- Jeffy Chen <jeffy.chen@rock-chips.com>  Fri, 7 Jan 2022 15:44:00 +0800

jjm2473 commented 1 year ago

@nyanmisaka 谢谢！我邮箱联系了维护者，已经更新到1.9.1了， https://github.com/jjm2473/librga ，目前没啥问题。不过 JeffyCN/mirrors 的版本确实挺坑的，害得我 Jellyfin 镜像都要重做才能支持 RK3588 。

JeffyCN commented 1 year ago

the librga here is not out-of-date(some chips like rk3399 would still using it).

there're multiple rewrite versions of rga drivers and libraries, and the maintainer doesn't care about api/chip compatible...so yes, it's really hard to use it :(

jjm2473 commented 1 year ago

@JeffyCN 较新的librga（1.8+）不是有判断驱动版本吗，我看上面有兼容代码，至少兼容RGA2和MULTI_RGA驱动了。1.3版本支持RGA2驱动，按你说的应该也兼容RGA驱动。

题外话，RGA相关的问题比MPP多，明明比MPP简单得多。例如不同的内存管理框架都可能影响RGA，之前使用1.3的librga时，就不能启用ION，不然会挂掉。还有RGA和MPP的数据对齐要求也不一样，MPP解码产生的硬件帧居然不满足RGA的stride对齐要求，就挺奇葩的。我建议RGA和MPP使用同一套内存管理的代码，不要各搞一套。甚至RGA都应该合并到MPP的接口去。

jjm2473 commented 1 year ago

根据 librga 1.9 的 readme，是兼容rk3399的

JeffyCN commented 1 year ago

最初得到反馈是不做兼容，导致出现很多问题，反馈也不处理，我这边很早就已经不再使用rga。

所谓的multi版本是最近客户强烈要求下才开始开发的，稳定性并没有完整验证过，可以预见很多接口上层和驱动存在一些边边角角的兼容问题(特别是旧的legacy api)。

至于那些奇葩的限制，他们只关心android，所以随意修改功能加限制不在乎兼容性

jjm2473 commented 1 year ago

@JeffyCN 你有 librga 的替代品吗，能进行硬件缩放就行，RGA 的坑比较多。我主要是为了转码的时候照顾视频编码能力较弱的SoC，需要缩放到1080p以下，例如 RK3568 只能编码 1080p30 。

今天发现的一个问题，一个h264 hi10p的视频，在RK3568上缩放转码没问题，在RK3588上却失败，完全一样的内核和操作系统，完全一样的Docker容器。而且测试其他的h264 hi10p视频，又两个SoC都能正常转码和缩放。RGA请求的数据还是MPP解码产生的。

内核日志：

[ 7280.605045] rga: Blit mode: request id = 18353
[ 7280.605049] rga_debugger: render_mode = 0, bitblit_mode=0, rotate_mode = 1
[ 7280.605051] rga_debugger: src: y = 8 uv = 0 v = 27d800 aw = 1920 ah = 1080 vw = 2400 vh = 1088
[ 7280.605053] rga_debugger: src: xoff = 0, yoff = 0, format = 0x20, rd_mode = 1
[ 7280.605055] rga_debugger: dst: y=2f uv=0 v=47040 aw=718 ah=404 vw=720 vh=404
[ 7280.605057] rga_debugger: dst: xoff = 0, yoff = 0, format = 0xa, rd_mode = 1
[ 7280.605058] rga_debugger: mmu: mmu_flag=80000521 en=1
[ 7280.605059] rga_debugger: alpha: rop_mode = 0
[ 7280.605060] rga_debugger: yuv2rgb mode is 0
[ 7280.605061] rga_debugger: set core = 0, priority = 0, in_fence_fd = 0
[ 7280.605066] rga_policy: start policy on core = 1
[ 7280.605068] rga_policy: unsupported width stride 2400, 0x20 should be 64 aligned!
[ 7280.605069] rga_policy: core = 1, break on rga_check_src0
[ 7280.605070] rga_policy: start policy on core = 2
[ 7280.605072] rga_policy: unsupported width stride 2400, 0x20 should be 64 aligned!
[ 7280.605073] rga_policy: core = 2, break on rga_check_src0
[ 7280.605074] rga_policy: start policy on core = 4
[ 7280.605075] rga_policy: optional_cores = 4
[ 7280.605076] rga_policy: assign core: 4
[ 7280.605984] rga_mm: RGA_MMU unsupported memory larger than 4G!
[ 7280.605993] rga_mm: scheduler core[4] unsupported mm_flag[0x0]!
[ 7280.606677] rga_mm: rga_mm_map_buffer map dma_buf error!
[ 7280.606686] rga_mm: job buffer map failed!
[ 7280.606694] rga_mm: src channel map job buffer failed!
[ 7280.606700] rga_mm: failed to map buffer
[ 7280.606708] rga_job: rga_job_commit: failed to map job info
[ 7280.606721] rga_job: request[18353] finished 0 failed 1
[ 7280.606726] rga_job: request[18353] task[0] job_commit failed.
[ 7280.606734] rga_job: rga request commit failed!
[ 7280.606742] rga: request[18353] submit failed!

JeffyCN commented 1 year ago

可以使用gpu，类似: https://github.com/JeffyCN/drm-cursor/blob/master/drm_egl.c

从log上看比较像是对齐不满足rga3需求，然后rga2 iommu不支持4g以上物理地址。可以尝试限制内存分配器的地址范围到4g以内

JeffyCN commented 1 year ago

drm:

b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -618,7 +618,7 @@ rockchip_gem_alloc_object(struct drm_device *drm, unsigned int size,
        struct rockchip_gem_object *rk_obj;
        struct drm_gem_object *obj;

-#ifdef CONFIG_ARM_LPAE
+#if 1
        gfp_t gfp_mask = GFP_HIGHUSER | __GFP_RECLAIMABLE | __GFP_DMA32;
 #else
        gfp_t gfp_mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;

jjm2473 commented 1 year ago

对， RGA3要求64对齐，但是 wstride=2400 不是64的倍数，不过这个是MPP解码出来的帧，如果我要进行64对齐，只能软件复制内存了吧，那肯定很费时间。包括4GB内存的限制，MPP和RGA居然还不一致。

谢谢你的补丁，我加上去看看。

JeffyCN commented 1 year ago

mpp应该可以加大解码的stride(set frame info之类的control)，或者gpu可能也可以硬件拷贝

jjm2473 commented 1 year ago

谢谢，我看看能不能让解码器输出满足对齐要求的帧。gpu由于项目用不上，也为了省电，是没驱动的状态。

jjm2473 commented 1 year ago

我在解码器的 MPP_DEC_SET_FRAME_INFO 阶段设置了 wstride，看起来没用代码： https://github.com/jjm2473/ffmpeg-rk/pull/11/files

日志：

[h264_rkmpp @ 0x7fb6ebb830] Decoder noticed an info change (1920x1080), stride(2400x1088), format=1
[h264_rkmpp @ 0x7fb6ebb830] Aligned to (1920x1080), stride(2432x1088), format=1
[h264_rkmpp @ 0x7fb6ebb830] Decoder noticed an info change (1920x1080), stride(2400x1088), format=1
[h264_rkmpp @ 0x7fb6ebb830] Aligned to (1920x1080), stride(2432x1088), format=1
[h264_rkmpp @ 0x7fb6ebb830] Decoder noticed an info change (1920x1080), stride(2400x1088), format=1
[h264_rkmpp @ 0x7fb6ebb830] Aligned to (1920x1080), stride(2432x1088), format=1
rga_api version 1.9.1_[3]
 RgaBlit(1483) RGA_BLIT fail: Invalid argument
 RgaBlit(1484) RGA_BLIT fail: Invalid argument
handl-fd-vir-phy-hnd-format[0, 10, 0, 0, 0, 0]
rect[0, 0, 1920, 1080, 2400, 1088, 8192, 0]
f-blend-size-rotation-col-log-mmu[0, 0, 0, 0, 0, 0, 1]
handl-fd-vir-phy-hnd-format[0, 47, 0, 0, 0, 0]
rect[0, 0, 718, 404, 720, 404, 2560, 436320]
f-blend-size-rotation-col-log-mmu[0, 0, 0, 0, 0, 0, 1]
This output the user patamaters when rga call blit fail
[scale_rga @ 0x7fbacd9190] RGA failed (code = -22)

2400 对齐到 2432，但是 RGA 还是收到 2400的，也就是mpp解码出来的wstride还是2400。

Decoder noticed an info change的日志也产生了多次，正常情况下只有一次。

JeffyCN commented 1 year ago

mpp的问题可以咨询 herman.chen@rock-chips.com，印象中这些信息是在generate_info_set生成，可能可以直接在里面强制对齐

jjm2473 commented 1 year ago

好的，谢谢

nyanmisaka commented 1 year ago

@JeffyCN Do you happen to know is there a method to export the DRM_FORMAT_NA12 / MPP_FMT_YUV420SP_10BIT format to EGL or other graphics API?

It is similar to P010 with 4:2:0 sub-sampling but has no padding between components, which makes it impossible to map directly to the 16bit GL_R16 and GL_RG16.

JeffyCN commented 1 year ago

i think the mali ddk is using nv15 for this format: https://github.com/JeffyCN/rockchip_mirrors/blob/buildroot/package/gstreamer1/gst1-plugins-base/0012-glupload-Support-NV12_10LE40-and-NV12-NV12_10LE40-NV.patch#L49

jjm2473 commented 1 year ago

@JeffyCN mpp输出的wstride的问题解决了。然后我修改了rockchip_drm_gem，增加了DMA32，但是RGA2还是报4GB限制错误。可能是因为我的系统里面使用的是dma heap？还有个奇怪的问题，就是目标分辨率 422x238 应该是满足RGA3的要求的，还是被回退到RGA2了。如果目标分辨率是1280x720的话，不会回退到 RGA2。

[  900.837860] rga: Blit mode: request id = 1001
[  900.837873] rga_debugger: render_mode = 0, bitblit_mode=0, rotate_mode = 1
[  900.837877] rga_debugger: src: y = 8 uv = 0 v = a05000 aw = 3840 ah = 2160 vw = 4864 vh = 2160
[  900.837880] rga_debugger: src: xoff = 0, yoff = 0, format = 0x20, rd_mode = 1
[  900.837884] rga_debugger: dst: y=15 uv=0 v=191a0 aw=422 ah=238 vw=432 vh=238
[  900.837887] rga_debugger: dst: xoff = 0, yoff = 0, format = 0xa, rd_mode = 1
[  900.837891] rga_debugger: mmu: mmu_flag=80000521 en=1
[  900.837894] rga_debugger: alpha: rop_mode = 0
[  900.837896] rga_debugger: yuv2rgb mode is 0
[  900.837898] rga_debugger: set core = 0, priority = 0, in_fence_fd = 0
[  900.837917] rga_policy: start policy on core = 1
[  900.837921] rga_policy: core = 1, break on rga_check_scale
[  900.837922] rga_policy: start policy on core = 2
[  900.837925] rga_policy: core = 2, break on rga_check_scale
[  900.837926] rga_policy: start policy on core = 4
[  900.837932] rga_policy: optional_cores = 4
[  900.837934] rga_policy: assign core: 4
[  900.847784] rga_mm: RGA_MMU unsupported memory larger than 4G!
[  900.847811] rga_mm: scheduler core[4] unsupported mm_flag[0x0]!
[  900.847846] rga_mm: rga_mm_map_buffer map dma_buf error!
[  900.847851] rga_mm: job buffer map failed!
[  900.847858] rga_mm: src channel map job buffer failed!
[  900.847864] rga_mm: failed to map buffer
[  900.847873] rga_job: rga_job_commit: failed to map job info
[  900.847894] rga_job: request[1001] finished 0 failed 1
[  900.847895] rga_job: request[1001] task[0] job_commit failed.
[  900.847902] rga_job: rga request commit failed!
[  900.847908] rga: request[1001] submit failed!

JeffyCN commented 1 year ago

rga_check_scale，盲猜是有缩放倍数限制。

dma heap印象中是dts有配不同节点，有的是4g以内，具体的可能还是要咨询相关maintainer

jjm2473 commented 1 year ago

哦，我检查了RGA3的限制，3840/422 是超过1/8～8倍的范围了，所以回退到RGA2。所以还是要解决内存分配位置的问题。

jjm2473 commented 1 year ago

DMA的4GB限制解决了，客户端将 mpp_buffer_group_get_internal(&ctx->frame_group, MPP_BUFFER_TYPE_DRM) 改成 mpp_buffer_group_get_internal(&ctx->frame_group, MPP_BUFFER_TYPE_DRM | MPP_BUFFER_FLAGS_DMA32) 即可，这样就不会分配到4GB以上了。此修改仅限dma_heap方式，如果只有DRM，可能还是要改内核的 rockchip_drm_gem。

JeffyCN / mirrors

rga_img_info_t 结构体在 librga 和内核 RGA3 代码定义不一致 #11

JeffyCN / mirrors

rga_img_info_t 结构体在 librga 和 内核 RGA3 代码定义不一致 #11

rga_img_info_t 结构体在 librga 和内核 RGA3 代码定义不一致 #11