ffvvc / FFmpeg

VVC Decoder for ffmpeg
Other
48 stars 12 forks source link

Optimze ALF #13

Open nuomi2021 opened 1 year ago

nuomi2021 commented 1 year ago

For https://github.com/ffvvc/tests/blob/main/performance/RitualDance_1920x1080_60_10_420_37_RA.266, ALF use about 30~40% cpu resources. May need to vectorize the alf filter

the following table is an old one, please do "./ffmpeg -i RitualDance_1920x1080_60_10_420_37_RA.266 -f rawvideo /dev/null -y" by yourself

28.82% ffmpeg_g ffmpeg_g [.] alf_filter_luma_10 7.43% ffmpeg_g ffmpeg_g [.] put_vvc_luma_hv_10 5.48% ffmpeg_g ffmpeg_g [.] ff_vvc_inv_dct2_64 5.30% ffmpeg_g ffmpeg_g [.] alf_get_coeff_and_clip_10 3.31% ffmpeg_g ffmpeg_g [.] inv_dct2.constprop.4 3.11% ffmpeg_g ffmpeg_g [.] alf_filter_chroma_10 3.04% ffmpeg_g ffmpeg_g [.] lmcs_filter_luma_10 2.50% ffmpeg_g ffmpeg_g [.] ff_vvc_deblocking_ctb_boundary_strengths 2.28% ffmpeg_g [kernel.kallsyms] [k] clear_page_erms 1.81% ffmpeg_g ffmpeg_g [.] put_vvc_luma_bi_hv_10 1.81% ffmpeg_g libc-2.31.so [.] 0x000000000018ba51 1.52% ffmpeg_g ffmpeg_g [.] ff_vvc_inv_dst7_32 1.50% ffmpeg_g ffmpeg_g [.] vvc_loop_filter_luma_10 1.46% ffmpeg_g ffmpeg_g [.] put_vvc_luma_uni_hv_10 1.37% ffmpeg_g [kernel.kallsyms] [k] __lock_text_start 1.30% ffmpeg_g ffmpeg_g [.] itransform.isra.0 1.27% ffmpeg_g ffmpeg_g [.] hls_coding_unit 1.27% ffmpeg_g ffmpeg_g [.] put_vvc_chroma_hv_10 1.27% ffmpeg_g ffmpeg_g [.] put_vvc_chroma_uni_hv_10 1.25% ffmpeg_g libc-2.31.so [.] 0x000000000018b643 1.17% ffmpeg_g ffmpeg_g [.] ff_vvc_residual_coding 1.08% ffmpeg_g ffmpeg_g [.] ff_vvc_deblock_filter 1.01% ffmpeg_g ffmpeg_g [.] ff_vvc_inv_dct2_32 0.85% ffmpeg_g ffmpeg_g [.] alf_filter_cc_10 0.79% ffmpeg_g ffmpeg_g [.] put_vvc_chroma_bi_hv_10 0.69% ffmpeg_g ffmpeg_g [.] alf_prepare_buffer 0.65% ffmpeg_g ffmpeg_g [.] intra_pred_10 0.55% ffmpeg_g ffmpeg_g [.] apply_prof_uni_10 0.53% ffmpeg_g libc-2.31.so [.] 0x000000000018b7b3 0.47% ffmpeg_g ffmpeg_g [.] ff_vvc_apply_dmvr_info_ctb 0.47% ffmpeg_g ffmpeg_g [.] put_vvc_luma_v_10

nuomi2021 commented 1 year ago

hot spot functions are here https://github.com/ffvvc/FFmpeg/blob/3cb136dc5fc70d65f9918453a842439323e81908/libavcodec/vvcdsp_template.c#L353 https://github.com/ffvvc/FFmpeg/blob/3cb136dc5fc70d65f9918453a842439323e81908/libavcodec/vvcdsp_template.c#L771