Closed Justa-Cai closed 2 years ago
上层 demo 软件很多额外开销 echo 0x100 > /sys/module/rk_vcodec/parameters/mpp_dev_debug 看硬件时间
加了 -o 选项有写文件,去掉
去掉写文件性能好太多了,有点没道理,里面 mpp_buffer_get_ptr获取内存地址再拷贝到文件中
rk_vcodec: fdf80200.rkvdec: pid: 18375, session: 00000000cb19fe7b, time: 3322 us
rk_vcodec: fdf80200.rkvdec: pid: 18375, session: 00000000cb19fe7b, time: 2526 us
mpi_dec_test: decode 480 frames time 952 ms delay 22 ms fps 503.99
加了写到文件性能
rk_vcodec: fdf80200.rkvdec: pid: 17549, session: 000000006a21feb5, time: 3597 us
mpi_dec_test: decode 480 frames time 22411 ms delay 9 ms fps 21.42
进一步分析了下,从mpp_buffer_get_ptr拷贝数据出来的性能很差
这是log,dump_mpp_frame_to_file耗时超过一帧时间,在58~68这个性能级别
11-14 02:51:25.978 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:68433
11-14 02:51:25.978 5817 5822 I mpi_dec_test: 0xe84c0280 decode get frame 472 fps:19.22 max_usage:9216000
11-14 02:51:26.036 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:57518
11-14 02:51:26.036 5817 5822 I mpi_dec_test: 0xe84c0280 decode get frame 473 fps:19.22 max_usage:9216000
11-14 02:51:26.094 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:57636
11-14 02:51:26.094 5817 5822 I mpi_dec_test: 0xe84c0280 decode get frame 474 fps:19.21 max_usage:9216000
11-14 02:51:26.152 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:57991
11-14 02:51:26.153 5817 5822 I mpi_dec_test: 0xe84c0280 decode get frame 475 fps:19.21 max_usage:9216000
11-14 02:51:26.211 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:57649
11-14 02:51:26.215 5817 5822 I mpi_dec_test: 0xe84c0280 found last packet
11-14 02:51:26.223 0 0 I rk_vcodec: fdf80200.rkvdec: pid: 5817, session: 0000000050327563, time: 3290 us
11-14 02:51:26.218 5817 5822 I mpi_dec_test: 0xe84c0280 decode get frame 476 fps:19.20 max_usage:9216000
11-14 02:51:26.228 0 0 I rk_vcodec: fdf80200.rkvdec: pid: 5817, session: 0000000050327563, time: 7940 us
11-14 02:51:26.228 0 0 I rk_vcodec: fdf80200.rkvdec: pid: 5817, session: 0000000050327563, time: 6589 us
11-14 02:51:26.232 0 0 I rk_vcodec: fdf80200.rkvdec: pid: 5817, session: 0000000050327563, time: 6481 us
11-14 02:51:26.232 0 0 I mpp_rkvdec2 fdf80200.rkvdec: resetting...
11-14 02:51:26.232 0 0 I mpp_rkvdec2 fdf80200.rkvdec: reset done
11-14 02:51:26.285 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:65800
11-14 02:51:26.286 5817 5822 I mpi_dec_test: 0xe84c0280 decode get frame 477 fps:19.19 max_usage:9216000
11-14 02:51:26.345 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:58320
11-14 02:51:26.347 5817 5822 I mpi_dec_test: 0xe84c0280 decode get frame 478 fps:19.18 max_usage:9216000
11-14 02:51:26.406 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:58768
11-14 02:51:26.407 5817 5822 I mpi_dec_test: 0xe84c0280 decode get frame 479 fps:19.17 max_usage:9216000 err 1 discard 0
11-14 02:51:26.407 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:2
11-14 02:51:26.408 5817 5822 I mpi_dec_test: 0xe84c0280 found last packet
11-14 02:51:26.408 5817 5822 I mpi_dec_test: decode 480 frames time 25001 ms delay 19 ms fps 19.20
11-14 02:51:26.423 5817 5817 I mpi_dec_test: test success max memory 8.79 MB
对应文件修改
case MPP_FMT_YUV420SP_VU :
case MPP_FMT_YUV420SP : {
RK_U32 i;
RK_U8 *base_y = base;
RK_U8 *base_c = base + h_stride * v_stride;
RK_U8 *tmp = mpp_malloc(RK_U8, h_stride * height * 3);
memcpy(tmp, base, h_stride * height * 3/2);
mpp_free(tmp);
// for (i = 0; i < height; i++, base_y += h_stride) {
// fwrite(base_y, 1, width, fp);
// }
// for (i = 0; i < height / 2; i++, base_c += h_stride) {
// fwrite(base_c, 1, width, fp);
// }
} break;
如果不做拷贝的性能,速度是非常不错的
11-14 02:54:01.634 7392 7397 I mpi_dec_test: 0xee2c0580 decode get frame 476 fps:467.07 max_usage:9216000
11-14 02:54:01.635 7392 7397 I mpi_dec_test: dump_mpp_frame_to_file:161
11-14 02:54:01.636 7392 7397 I mpi_dec_test: 0xee2c0580 decode get frame 477 fps:467.33 max_usage:9216000
11-14 02:54:01.636 7392 7397 I mpi_dec_test: dump_mpp_frame_to_file:142
11-14 02:54:01.642 0 0 I rk_vcodec: fdf80200.rkvdec: pid: 7392, session: 00000000953ba828, time: 3871 us
11-14 02:54:01.643 0 0 I rk_vcodec: fdf80200.rkvdec: pid: 7392, session: 00000000953ba828, time: 3643 us
11-14 02:54:01.638 7392 7397 I mpi_dec_test: 0xee2c0580 decode get frame 478 fps:467.16 max_usage:9216000
11-14 02:54:01.639 7392 7397 I mpi_dec_test: dump_mpp_frame_to_file:298
11-14 02:54:01.640 7392 7397 I mpi_dec_test: 0xee2c0580 decode get frame 479 fps:467.41 max_usage:9216000 err 1 discard 0
11-14 02:54:01.640 7392 7397 I mpi_dec_test: dump_mpp_frame_to_file:1
11-14 02:54:01.645 0 0 I rk_vcodec: fdf80200.rkvdec: pid: 7392, session: 00000000953ba828, time: 3714 us
11-14 02:54:01.640 7392 7397 I mpi_dec_test: 0xee2c0580 found last packet
11-14 02:54:01.640 7392 7397 I mpi_dec_test: decode 480 frames time 1039 ms delay 14 ms fps 461.83
进一步分析了下,从mpp_buffer_get_ptr拷贝数据出来的性能很差
这是log,dump_mpp_frame_to_file耗时超过一帧时间,在58~68这个性能级别
11-14 02:51:25.978 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:68433 11-14 02:51:25.978 5817 5822 I mpi_dec_test: 0xe84c0280 decode get frame 472 fps:19.22 max_usage:9216000 11-14 02:51:26.036 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:57518 11-14 02:51:26.036 5817 5822 I mpi_dec_test: 0xe84c0280 decode get frame 473 fps:19.22 max_usage:9216000 11-14 02:51:26.094 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:57636 11-14 02:51:26.094 5817 5822 I mpi_dec_test: 0xe84c0280 decode get frame 474 fps:19.21 max_usage:9216000 11-14 02:51:26.152 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:57991 11-14 02:51:26.153 5817 5822 I mpi_dec_test: 0xe84c0280 decode get frame 475 fps:19.21 max_usage:9216000 11-14 02:51:26.211 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:57649 11-14 02:51:26.215 5817 5822 I mpi_dec_test: 0xe84c0280 found last packet 11-14 02:51:26.223 0 0 I rk_vcodec: fdf80200.rkvdec: pid: 5817, session: 0000000050327563, time: 3290 us 11-14 02:51:26.218 5817 5822 I mpi_dec_test: 0xe84c0280 decode get frame 476 fps:19.20 max_usage:9216000 11-14 02:51:26.228 0 0 I rk_vcodec: fdf80200.rkvdec: pid: 5817, session: 0000000050327563, time: 7940 us 11-14 02:51:26.228 0 0 I rk_vcodec: fdf80200.rkvdec: pid: 5817, session: 0000000050327563, time: 6589 us 11-14 02:51:26.232 0 0 I rk_vcodec: fdf80200.rkvdec: pid: 5817, session: 0000000050327563, time: 6481 us 11-14 02:51:26.232 0 0 I mpp_rkvdec2 fdf80200.rkvdec: resetting... 11-14 02:51:26.232 0 0 I mpp_rkvdec2 fdf80200.rkvdec: reset done 11-14 02:51:26.285 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:65800 11-14 02:51:26.286 5817 5822 I mpi_dec_test: 0xe84c0280 decode get frame 477 fps:19.19 max_usage:9216000 11-14 02:51:26.345 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:58320 11-14 02:51:26.347 5817 5822 I mpi_dec_test: 0xe84c0280 decode get frame 478 fps:19.18 max_usage:9216000 11-14 02:51:26.406 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:58768 11-14 02:51:26.407 5817 5822 I mpi_dec_test: 0xe84c0280 decode get frame 479 fps:19.17 max_usage:9216000 err 1 discard 0 11-14 02:51:26.407 5817 5822 I mpi_dec_test: dump_mpp_frame_to_file:2 11-14 02:51:26.408 5817 5822 I mpi_dec_test: 0xe84c0280 found last packet 11-14 02:51:26.408 5817 5822 I mpi_dec_test: decode 480 frames time 25001 ms delay 19 ms fps 19.20 11-14 02:51:26.423 5817 5817 I mpi_dec_test: test success max memory 8.79 MB
对应文件修改
case MPP_FMT_YUV420SP_VU : case MPP_FMT_YUV420SP : { RK_U32 i; RK_U8 *base_y = base; RK_U8 *base_c = base + h_stride * v_stride; RK_U8 *tmp = mpp_malloc(RK_U8, h_stride * height * 3); memcpy(tmp, base, h_stride * height * 3/2); mpp_free(tmp); // for (i = 0; i < height; i++, base_y += h_stride) { // fwrite(base_y, 1, width, fp); // } // for (i = 0; i < height / 2; i++, base_c += h_stride) { // fwrite(base_c, 1, width, fp); // } } break;
这个拷贝为什么性能这么差,帮忙分析下
这很正常啊,硬件 buffer 默认开的 non-cachable 的啊,不写文件,不去做映射,速度就很快啊
这很正常啊,硬件 buffer 默认开的 non-cachable 的啊,不写文件,不去做映射,速度就很快啊
在RK3288平台上测试数据会比较好,有什么优化方法吗?3566这个平台A55核心不应该这么拉跨才对
11-14 16:17:51.053 6227 6232 I mpi_dec_test: dump_mpp_frame_to_file:5705
11-14 16:17:51.053 6227 6232 I mpi_dec_test: 0xb1393000 decode get frame 1790 fps:148.58 max_usage:9953280
11-14 16:17:51.058 6227 6232 I mpi_dec_test: dump_mpp_frame_to_file:5187
11-14 16:17:51.058 6227 6232 I mpi_dec_test: 0xb1393000 decode get frame 1791 fps:148.59 max_usage:9953280
11-14 16:17:51.064 6227 6232 I mpi_dec_test: dump_mpp_frame_to_file:5500
11-14 16:17:51.064 6227 6232 I mpi_dec_test: 0xb1393000 decode get frame 1792 fps:148.61 max_usage:9953280
11-14 16:17:51.069 6227 6232 I mpi_dec_test: dump_mpp_frame_to_file:4800
11-14 16:17:51.069 6227 6232 I mpi_dec_test: 0xb1393000 decode get frame 1793 fps:148.63 max_usage:9953280
11-14 16:17:51.074 6227 6232 I mpi_dec_test: dump_mpp_frame_to_file:4502
11-14 16:17:51.074 6227 6232 I mpi_dec_test: 0xb1393000 decode get frame 1794 fps:148.65 max_usage:9953280
11-14 16:17:51.079 6227 6232 I mpi_dec_test: dump_mpp_frame_to_file:5274
11-14 16:17:51.079 6227 6232 I mpi_dec_test: 0xb1393000 decode get frame 1795 fps:148.67 max_usage:9953280
11-14 16:17:51.084 6227 6232 I mpi_dec_test: dump_mpp_frame_to_file:4562
11-14 16:17:51.084 6227 6232 I mpi_dec_test: 0xb1393000 found last packet
11-14 16:17:51.089 6227 6232 I mpi_dec_test: 0xb1393000 decode get frame 1796 fps:148.63 max_usage:9953280
11-14 16:17:51.094 6227 6232 I mpi_dec_test: dump_mpp_frame_to_file:5077
11-14 16:17:51.096 6227 6232 I mpi_dec_test: 0xb1393000 decode get frame 1797 fps:148.63 max_usage:9953280
11-14 16:17:51.100 6227 6232 I mpi_dec_test: dump_mpp_frame_to_file:4358
11-14 16:17:51.101 6227 6232 I mpi_dec_test: 0xb1393000 decode get frame 1798 fps:148.65 max_usage:9953280
11-14 16:17:51.106 6227 6232 I mpi_dec_test: dump_mpp_frame_to_file:4987
11-14 16:17:51.106 6227 6232 I mpi_dec_test: 0xb1393000 found last packet
11-14 16:17:51.106 6227 6232 I mpi_dec_test: decode 1799 frames time 12119 ms delay 18 ms fps 148.44
3288 的 cpu 强啊……3566 的核相对差了不止一个档次
看硬件解码时间吧,软件时间只能做为参考
硬解码时间,大家都差不多。 现在是解码后的数据搬运时间差距比较大,3288是DDR3,3566是DDR4,CPU性能差再多不应该拷贝720P的数据耗时差了快10倍
3288/3566的CPU算力跑分,有对比过,3288确实会好,3566也不会差很多
3288 CPU: 26332
3566 CPU: 24593
优化下来了,64位性能反而没有32位好,64位从45fps->90+fps,32位从33fps->150+fps 提升巨大
11-16 10:05:15.645 15677 15687 I mpi_dec_test: 0xeaf00460 decode get frame 1796 fps:136.03 max_usage:16588800
11-16 10:05:15.648 15677 15687 I mpi_dec_test: dump_mpp_frame_to_file:2495
11-16 10:05:15.651 15677 15687 I mpi_dec_test: 0xeaf00460 decode get frame 1797 fps:136.04 max_usage:16588800
11-16 10:05:15.654 15677 15687 I mpi_dec_test: dump_mpp_frame_to_file:2435
11-16 10:05:15.658 15677 15687 I mpi_dec_test: 0xeaf00460 decode get frame 1798 fps:136.05 max_usage:16588800
11-16 10:05:15.660 15677 15687 I mpi_dec_test: dump_mpp_frame_to_file:2382
11-16 10:05:15.660 15677 15687 I mpi_dec_test: 0xeaf00460 found last packet
11-16 10:05:15.661 15677 15687 I mpi_dec_test: decode 1799 frames time 13243 ms delay 19 ms fps 135.84
11-16 10:05:15.681 15677 15677 I mpi_dec_test: test success max memory 15.82 MB
11-16 10:06:41.503 16503 16507 I mpi_dec_test: dump_mpp_frame_to_file:6375
11-16 10:06:41.507 16503 16507 I mpi_dec_test: 0xb400007698a30a60 decode get frame 474 fps:85.83 max_usage:9216000
11-16 10:06:41.511 16503 16507 I utils : =======Justa /home/justa/work/opensource/mpp/utils/utils.c:118
11-16 10:06:41.511 16503 16507 I mpi_dec_test: dump_mpp_frame_to_file:4837
11-16 10:06:41.515 16503 16507 I mpi_dec_test: 0xb400007698a30a60 decode get frame 475 fps:85.88 max_usage:9216000
11-16 10:06:41.524 16503 16507 I utils : =======Justa /home/justa/work/opensource/mpp/utils/utils.c:118
11-16 10:06:41.524 16503 16507 I mpi_dec_test: dump_mpp_frame_to_file:8669
11-16 10:06:41.527 16503 16507 I mpi_dec_test: 0xb400007698a30a60 decode get frame 476 fps:85.87 max_usage:9216000
11-16 10:06:41.532 16503 16507 I utils : =======Justa /home/justa/work/opensource/mpp/utils/utils.c:118
方案可用,先关掉
硬件平台
RK3566
android版本
Android 12.1 @ rk-r7
补丁
测试方法
mpi_dec_test -i /sdcard/1.264 -o /sdcard/Movies/1.yuv -w 1280 -h 720
测试后数据
32bit的程序,稳定在17~20帧左右
64位的程序,稳定在35帧左右
详见redmine-383315