HermanChen / mpp

Rockchip MPP(Media Process Platfrom)
176 stars 72 forks source link

RK3566 H264解码性能较差,实测到到不了60fps #51

Closed Justa-Cai closed 2 years ago

Justa-Cai commented 2 years ago

硬件平台

RK3566

android版本

Android 12.1 @ rk-r7

补丁

@@ -254,7 +255,7 @@ static int dec_simple(MpiDecLoopData *data)
                         data->first_frm = mpp_time();

                     log_len += snprintf(log_buf + log_len, log_size - log_len,
-                                        "decode get frame %d", data->frame_count);
+    

测试方法

mpi_dec_test -i /sdcard/1.264 -o /sdcard/Movies/1.yuv -w 1280 -h 720

测试后数据

32bit的程序,稳定在17~20帧左右

11-11 13:21:40.348 10243 10269 I mpi_dec_test: 0xe7880070 decode get frame 372 fps:19.40
11-11 13:21:40.380 10243 10269 I mpi_dec_test: 0xe7880070 decode get frame 373 fps:19.42
11-11 13:21:40.413 10243 10269 I mpi_dec_test: 0xe7880070 decode get frame 374 fps:19.44
11-11 13:21:40.443 10243 10269 I mpi_dec_test: 0xe7880070 decode get frame 375 fps:19.46
11-11 13:21:40.479 10243 10269 I mpi_dec_test: 0xe7880070 decode get frame 376 fps:19.47

64位的程序,稳定在35帧左右

11-11 13:22:21.429 10789 10793 I mpi_dec_test: 0xb400007c9a22d470 decode get frame 247 fps:37.22
11-11 13:22:21.450 10789 10793 I mpi_dec_test: 0xb400007c9a22d470 decode get frame 248 fps:37.25
11-11 13:22:21.471 10789 10793 I mpi_dec_test: 0xb400007c9a22d470 decode get frame 249 fps:37.29

详见redmine-383315

HermanChen commented 2 years ago

上层 demo 软件很多额外开销 echo 0x100 > /sys/module/rk_vcodec/parameters/mpp_dev_debug 看硬件时间

HermanChen commented 2 years ago

加了 -o 选项有写文件,去掉

Justa-Cai commented 2 years ago

去掉写文件性能好太多了,有点没道理,里面 mpp_buffer_get_ptr获取内存地址再拷贝到文件中

rk_vcodec: fdf80200.rkvdec: pid: 18375, session: 00000000cb19fe7b, time: 3322 us
rk_vcodec: fdf80200.rkvdec: pid: 18375, session: 00000000cb19fe7b, time: 2526 us
mpi_dec_test: decode 480 frames time 952 ms delay  22 ms fps 503.99

加了写到文件性能

rk_vcodec: fdf80200.rkvdec: pid: 17549, session: 000000006a21feb5, time: 3597 us
mpi_dec_test: decode 480 frames time 22411 ms delay   9 ms fps 21.42
Justa-Cai commented 2 years ago

进一步分析了下,从mpp_buffer_get_ptr拷贝数据出来的性能很差

这是log,dump_mpp_frame_to_file耗时超过一帧时间,在58~68这个性能级别

11-14 02:51:25.978  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:68433
11-14 02:51:25.978  5817  5822 I mpi_dec_test: 0xe84c0280 decode get frame 472 fps:19.22 max_usage:9216000
11-14 02:51:26.036  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:57518
11-14 02:51:26.036  5817  5822 I mpi_dec_test: 0xe84c0280 decode get frame 473 fps:19.22 max_usage:9216000
11-14 02:51:26.094  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:57636
11-14 02:51:26.094  5817  5822 I mpi_dec_test: 0xe84c0280 decode get frame 474 fps:19.21 max_usage:9216000
11-14 02:51:26.152  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:57991
11-14 02:51:26.153  5817  5822 I mpi_dec_test: 0xe84c0280 decode get frame 475 fps:19.21 max_usage:9216000
11-14 02:51:26.211  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:57649
11-14 02:51:26.215  5817  5822 I mpi_dec_test: 0xe84c0280 found last packet
11-14 02:51:26.223     0     0 I rk_vcodec: fdf80200.rkvdec: pid: 5817, session: 0000000050327563, time: 3290 us
11-14 02:51:26.218  5817  5822 I mpi_dec_test: 0xe84c0280 decode get frame 476 fps:19.20 max_usage:9216000
11-14 02:51:26.228     0     0 I rk_vcodec: fdf80200.rkvdec: pid: 5817, session: 0000000050327563, time: 7940 us
11-14 02:51:26.228     0     0 I rk_vcodec: fdf80200.rkvdec: pid: 5817, session: 0000000050327563, time: 6589 us
11-14 02:51:26.232     0     0 I rk_vcodec: fdf80200.rkvdec: pid: 5817, session: 0000000050327563, time: 6481 us
11-14 02:51:26.232     0     0 I mpp_rkvdec2 fdf80200.rkvdec: resetting...
11-14 02:51:26.232     0     0 I mpp_rkvdec2 fdf80200.rkvdec: reset done
11-14 02:51:26.285  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:65800
11-14 02:51:26.286  5817  5822 I mpi_dec_test: 0xe84c0280 decode get frame 477 fps:19.19 max_usage:9216000
11-14 02:51:26.345  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:58320
11-14 02:51:26.347  5817  5822 I mpi_dec_test: 0xe84c0280 decode get frame 478 fps:19.18 max_usage:9216000
11-14 02:51:26.406  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:58768
11-14 02:51:26.407  5817  5822 I mpi_dec_test: 0xe84c0280 decode get frame 479 fps:19.17 max_usage:9216000 err 1 discard 0
11-14 02:51:26.407  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:2
11-14 02:51:26.408  5817  5822 I mpi_dec_test: 0xe84c0280 found last packet
11-14 02:51:26.408  5817  5822 I mpi_dec_test: decode 480 frames time 25001 ms delay  19 ms fps 19.20
11-14 02:51:26.423  5817  5817 I mpi_dec_test: test success max memory 8.79 MB

对应文件修改

    case MPP_FMT_YUV420SP_VU :
    case MPP_FMT_YUV420SP : {
        RK_U32 i;
        RK_U8 *base_y = base;
        RK_U8 *base_c = base + h_stride * v_stride;
        RK_U8 *tmp = mpp_malloc(RK_U8, h_stride * height * 3);

        memcpy(tmp, base, h_stride * height * 3/2);

        mpp_free(tmp);

        // for (i = 0; i < height; i++, base_y += h_stride) {
        //     fwrite(base_y, 1, width, fp);
        // }
        // for (i = 0; i < height / 2; i++, base_c += h_stride) {
        //     fwrite(base_c, 1, width, fp);
        // }
    } break;
Justa-Cai commented 2 years ago

如果不做拷贝的性能,速度是非常不错的

11-14 02:54:01.634  7392  7397 I mpi_dec_test: 0xee2c0580 decode get frame 476 fps:467.07 max_usage:9216000
11-14 02:54:01.635  7392  7397 I mpi_dec_test: dump_mpp_frame_to_file:161
11-14 02:54:01.636  7392  7397 I mpi_dec_test: 0xee2c0580 decode get frame 477 fps:467.33 max_usage:9216000
11-14 02:54:01.636  7392  7397 I mpi_dec_test: dump_mpp_frame_to_file:142
11-14 02:54:01.642     0     0 I rk_vcodec: fdf80200.rkvdec: pid: 7392, session: 00000000953ba828, time: 3871 us
11-14 02:54:01.643     0     0 I rk_vcodec: fdf80200.rkvdec: pid: 7392, session: 00000000953ba828, time: 3643 us
11-14 02:54:01.638  7392  7397 I mpi_dec_test: 0xee2c0580 decode get frame 478 fps:467.16 max_usage:9216000
11-14 02:54:01.639  7392  7397 I mpi_dec_test: dump_mpp_frame_to_file:298
11-14 02:54:01.640  7392  7397 I mpi_dec_test: 0xee2c0580 decode get frame 479 fps:467.41 max_usage:9216000 err 1 discard 0
11-14 02:54:01.640  7392  7397 I mpi_dec_test: dump_mpp_frame_to_file:1
11-14 02:54:01.645     0     0 I rk_vcodec: fdf80200.rkvdec: pid: 7392, session: 00000000953ba828, time: 3714 us
11-14 02:54:01.640  7392  7397 I mpi_dec_test: 0xee2c0580 found last packet
11-14 02:54:01.640  7392  7397 I mpi_dec_test: decode 480 frames time 1039 ms delay  14 ms fps 461.83
Justa-Cai commented 2 years ago

进一步分析了下,从mpp_buffer_get_ptr拷贝数据出来的性能很差

这是log,dump_mpp_frame_to_file耗时超过一帧时间,在58~68这个性能级别

11-14 02:51:25.978  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:68433
11-14 02:51:25.978  5817  5822 I mpi_dec_test: 0xe84c0280 decode get frame 472 fps:19.22 max_usage:9216000
11-14 02:51:26.036  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:57518
11-14 02:51:26.036  5817  5822 I mpi_dec_test: 0xe84c0280 decode get frame 473 fps:19.22 max_usage:9216000
11-14 02:51:26.094  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:57636
11-14 02:51:26.094  5817  5822 I mpi_dec_test: 0xe84c0280 decode get frame 474 fps:19.21 max_usage:9216000
11-14 02:51:26.152  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:57991
11-14 02:51:26.153  5817  5822 I mpi_dec_test: 0xe84c0280 decode get frame 475 fps:19.21 max_usage:9216000
11-14 02:51:26.211  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:57649
11-14 02:51:26.215  5817  5822 I mpi_dec_test: 0xe84c0280 found last packet
11-14 02:51:26.223     0     0 I rk_vcodec: fdf80200.rkvdec: pid: 5817, session: 0000000050327563, time: 3290 us
11-14 02:51:26.218  5817  5822 I mpi_dec_test: 0xe84c0280 decode get frame 476 fps:19.20 max_usage:9216000
11-14 02:51:26.228     0     0 I rk_vcodec: fdf80200.rkvdec: pid: 5817, session: 0000000050327563, time: 7940 us
11-14 02:51:26.228     0     0 I rk_vcodec: fdf80200.rkvdec: pid: 5817, session: 0000000050327563, time: 6589 us
11-14 02:51:26.232     0     0 I rk_vcodec: fdf80200.rkvdec: pid: 5817, session: 0000000050327563, time: 6481 us
11-14 02:51:26.232     0     0 I mpp_rkvdec2 fdf80200.rkvdec: resetting...
11-14 02:51:26.232     0     0 I mpp_rkvdec2 fdf80200.rkvdec: reset done
11-14 02:51:26.285  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:65800
11-14 02:51:26.286  5817  5822 I mpi_dec_test: 0xe84c0280 decode get frame 477 fps:19.19 max_usage:9216000
11-14 02:51:26.345  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:58320
11-14 02:51:26.347  5817  5822 I mpi_dec_test: 0xe84c0280 decode get frame 478 fps:19.18 max_usage:9216000
11-14 02:51:26.406  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:58768
11-14 02:51:26.407  5817  5822 I mpi_dec_test: 0xe84c0280 decode get frame 479 fps:19.17 max_usage:9216000 err 1 discard 0
11-14 02:51:26.407  5817  5822 I mpi_dec_test: dump_mpp_frame_to_file:2
11-14 02:51:26.408  5817  5822 I mpi_dec_test: 0xe84c0280 found last packet
11-14 02:51:26.408  5817  5822 I mpi_dec_test: decode 480 frames time 25001 ms delay  19 ms fps 19.20
11-14 02:51:26.423  5817  5817 I mpi_dec_test: test success max memory 8.79 MB

对应文件修改

    case MPP_FMT_YUV420SP_VU :
    case MPP_FMT_YUV420SP : {
        RK_U32 i;
        RK_U8 *base_y = base;
        RK_U8 *base_c = base + h_stride * v_stride;
        RK_U8 *tmp = mpp_malloc(RK_U8, h_stride * height * 3);

        memcpy(tmp, base, h_stride * height * 3/2);

        mpp_free(tmp);

        // for (i = 0; i < height; i++, base_y += h_stride) {
        //     fwrite(base_y, 1, width, fp);
        // }
        // for (i = 0; i < height / 2; i++, base_c += h_stride) {
        //     fwrite(base_c, 1, width, fp);
        // }
    } break;

这个拷贝为什么性能这么差,帮忙分析下

HermanChen commented 2 years ago

这很正常啊,硬件 buffer 默认开的 non-cachable 的啊,不写文件,不去做映射,速度就很快啊

Justa-Cai commented 2 years ago

这很正常啊,硬件 buffer 默认开的 non-cachable 的啊,不写文件,不去做映射,速度就很快啊

在RK3288平台上测试数据会比较好,有什么优化方法吗?3566这个平台A55核心不应该这么拉跨才对

11-14 16:17:51.053  6227  6232 I mpi_dec_test: dump_mpp_frame_to_file:5705
11-14 16:17:51.053  6227  6232 I mpi_dec_test: 0xb1393000 decode get frame 1790 fps:148.58 max_usage:9953280
11-14 16:17:51.058  6227  6232 I mpi_dec_test: dump_mpp_frame_to_file:5187
11-14 16:17:51.058  6227  6232 I mpi_dec_test: 0xb1393000 decode get frame 1791 fps:148.59 max_usage:9953280
11-14 16:17:51.064  6227  6232 I mpi_dec_test: dump_mpp_frame_to_file:5500
11-14 16:17:51.064  6227  6232 I mpi_dec_test: 0xb1393000 decode get frame 1792 fps:148.61 max_usage:9953280
11-14 16:17:51.069  6227  6232 I mpi_dec_test: dump_mpp_frame_to_file:4800
11-14 16:17:51.069  6227  6232 I mpi_dec_test: 0xb1393000 decode get frame 1793 fps:148.63 max_usage:9953280
11-14 16:17:51.074  6227  6232 I mpi_dec_test: dump_mpp_frame_to_file:4502
11-14 16:17:51.074  6227  6232 I mpi_dec_test: 0xb1393000 decode get frame 1794 fps:148.65 max_usage:9953280
11-14 16:17:51.079  6227  6232 I mpi_dec_test: dump_mpp_frame_to_file:5274
11-14 16:17:51.079  6227  6232 I mpi_dec_test: 0xb1393000 decode get frame 1795 fps:148.67 max_usage:9953280
11-14 16:17:51.084  6227  6232 I mpi_dec_test: dump_mpp_frame_to_file:4562
11-14 16:17:51.084  6227  6232 I mpi_dec_test: 0xb1393000 found last packet
11-14 16:17:51.089  6227  6232 I mpi_dec_test: 0xb1393000 decode get frame 1796 fps:148.63 max_usage:9953280
11-14 16:17:51.094  6227  6232 I mpi_dec_test: dump_mpp_frame_to_file:5077
11-14 16:17:51.096  6227  6232 I mpi_dec_test: 0xb1393000 decode get frame 1797 fps:148.63 max_usage:9953280
11-14 16:17:51.100  6227  6232 I mpi_dec_test: dump_mpp_frame_to_file:4358
11-14 16:17:51.101  6227  6232 I mpi_dec_test: 0xb1393000 decode get frame 1798 fps:148.65 max_usage:9953280
11-14 16:17:51.106  6227  6232 I mpi_dec_test: dump_mpp_frame_to_file:4987
11-14 16:17:51.106  6227  6232 I mpi_dec_test: 0xb1393000 found last packet
11-14 16:17:51.106  6227  6232 I mpi_dec_test: decode 1799 frames time 12119 ms delay  18 ms fps 148.44
HermanChen commented 2 years ago

3288 的 cpu 强啊……3566 的核相对差了不止一个档次

HermanChen commented 2 years ago

看硬件解码时间吧,软件时间只能做为参考

Justa-Cai commented 2 years ago

硬解码时间,大家都差不多。 现在是解码后的数据搬运时间差距比较大,3288是DDR3,3566是DDR4,CPU性能差再多不应该拷贝720P的数据耗时差了快10倍

3288/3566的CPU算力跑分,有对比过,3288确实会好,3566也不会差很多

3288 CPU: 26332
3566 CPU: 24593
Justa-Cai commented 2 years ago

优化下来了,64位性能反而没有32位好,64位从45fps->90+fps,32位从33fps->150+fps 提升巨大

32位数据

11-16 10:05:15.645 15677 15687 I mpi_dec_test: 0xeaf00460 decode get frame 1796 fps:136.03 max_usage:16588800
11-16 10:05:15.648 15677 15687 I mpi_dec_test: dump_mpp_frame_to_file:2495
11-16 10:05:15.651 15677 15687 I mpi_dec_test: 0xeaf00460 decode get frame 1797 fps:136.04 max_usage:16588800
11-16 10:05:15.654 15677 15687 I mpi_dec_test: dump_mpp_frame_to_file:2435
11-16 10:05:15.658 15677 15687 I mpi_dec_test: 0xeaf00460 decode get frame 1798 fps:136.05 max_usage:16588800
11-16 10:05:15.660 15677 15687 I mpi_dec_test: dump_mpp_frame_to_file:2382
11-16 10:05:15.660 15677 15687 I mpi_dec_test: 0xeaf00460 found last packet
11-16 10:05:15.661 15677 15687 I mpi_dec_test: decode 1799 frames time 13243 ms delay  19 ms fps 135.84
11-16 10:05:15.681 15677 15677 I mpi_dec_test: test success max memory 15.82 MB

64位数据

11-16 10:06:41.503 16503 16507 I mpi_dec_test: dump_mpp_frame_to_file:6375
11-16 10:06:41.507 16503 16507 I mpi_dec_test: 0xb400007698a30a60 decode get frame 474 fps:85.83 max_usage:9216000
11-16 10:06:41.511 16503 16507 I utils   : =======Justa /home/justa/work/opensource/mpp/utils/utils.c:118
11-16 10:06:41.511 16503 16507 I mpi_dec_test: dump_mpp_frame_to_file:4837
11-16 10:06:41.515 16503 16507 I mpi_dec_test: 0xb400007698a30a60 decode get frame 475 fps:85.88 max_usage:9216000
11-16 10:06:41.524 16503 16507 I utils   : =======Justa /home/justa/work/opensource/mpp/utils/utils.c:118
11-16 10:06:41.524 16503 16507 I mpi_dec_test: dump_mpp_frame_to_file:8669
11-16 10:06:41.527 16503 16507 I mpi_dec_test: 0xb400007698a30a60 decode get frame 476 fps:85.87 max_usage:9216000
11-16 10:06:41.532 16503 16507 I utils   : =======Justa /home/justa/work/opensource/mpp/utils/utils.c:118
Justa-Cai commented 2 years ago

方案可用,先关掉