[mpi_dec_test] RGB output performance

fel88 commented 3 years ago

Hi, I try to use mpi_dec_test.c in order to get RGB output (NV12->RGB). I am using single-board computer OrangePi4 with Rockhip RK3399 SoC. Orange's Ubuntu Bionic Image v1.3. Linux kernel is 4.4.179.

My test video stream is h264 3072x2048 yuvj420p. The decoding itself performs really fast, but post-processing kill all benefits. It takes about 60 ms to copy data from the buffer to RAM.

void my_dump_mpp_frame_to_file(MppFrame frame, unsigned char *target)
{
    RK_U32 width    = 0;
    RK_U32 height   = 0;
    RK_U32 h_stride = 0;
    RK_U32 v_stride = 0;
    MppBuffer buffer    = NULL;
    RK_U8 *base = NULL;

    if (NULL == target||  NULL == frame)
        return ;

    width    = mpp_frame_get_width(frame);
    height   = mpp_frame_get_height(frame);
    h_stride = mpp_frame_get_hor_stride(frame);
    v_stride = mpp_frame_get_ver_stride(frame);  
    buffer  = mpp_frame_get_buffer(frame);

    if (NULL == buffer)
        return ;

    base = (RK_U8 *)mpp_buffer_get_ptr(buffer );

    RK_U8 *base_y = base;
    RK_U8 *base_c = base + h_stride * v_stride;

    memcpy(target, base, width*height+width*(height/2)); // this memcpy take ~60ms
}

...
unsigned char yv12DataBuffer[3072*1024*3]; //global var
...
// NV12-> RGB convertation
auto nWidth = cmd->width;
auto nHeight = cmd->height;
cv::Mat picYV12 = cv::Mat(nHeight * 3/2, nWidth, CV_8UC1, yv12DataBuffer);
cv::Mat picBGR;
cv::cvtColor(picYV12, picBGR, cv::COLOR_YUV2RGB_NV21 );

What is the best way to get RGB image after decoding? Should I use a different buffer mode (external) to improve performance?

HermanChen commented 3 years ago

Try use cachable hardware memory and it will be faster.
Use RGA to do yuv to RGB conversion it will be even more faster. Do not use cpu to access the pixel data.

paintenzero commented 3 years ago

Could you please explain more about cacheable hardware memory? How is it different from the regular memory? I would greatly appreciate any examples.

HermanChen commented 3 years ago

The memory for encoder/decoder hardware is not normal malloc memory. It is dma-buf in fact and provide by kernel through ion (on Android) or drm (on Linux). The allocator (ion or drm) can make the memory cacheable or non-cache for CPU. https://www.kernel.org/doc/html/v4.14/driver-api/dma-buf.html

fel88 commented 3 years ago

I tried to run rga_test.cpp , but I got the error:

sudo ./rga_test -i input.raw -o out.raw -w 1920 -h 1080 -f 0 -dst_w 1920 -dst_h 1080 -dst_fmt 0
...
mpp[7242]: mpp_log: rga ioctl failed errno:25 Inappropriate ioctl for device

rga.cpp

#define DEFAULT_RGA_DEV     "/dev/video0"

sudo v4l2-ctl --list-devices

rockchip,rk3399-vpu-enc (platform: hantro-vpu):
        /dev/video1
        /dev/video2

rockchip-rga (platform:rga):
        /dev/video0

rkvdec (platform:rkvdec):
        /dev/video3

I also tried to use this code: https://github.com/McAronDev/RK3188_colorspace_convert But the result is the same : RGA_BLIT_SYNC Failed (on this line ioctl(fd, RGA_BLIT_SYNC, &Rga_Request))

What could be a problem?

paintenzero commented 3 years ago

It also worth mentioning that original Orange Pi 4 firmware has RGA node disabled in the devicetree and we couldn’t find easy way to enable it. That’s why we took Armbian’s U-Boot and Linux 5.8.6 and mixed it with Orange’s userland Ubuntu 18.04 where we have developed our software so far.

paintenzero commented 3 years ago

So after some research I discovered that video decoders moved to video4linux subsystem in kernel 5.8. So we have to use standard APIs to work with hardware codecs. For now we could properly use kernel 5.8.6 and ffmpeg with "hwaccel drm and v4l2-request" patches.

HermanChen / mpp

[mpi_dec_test] RGB output performance #21