OpenVisualCloud / SVT-HEVC

SVT HEVC encoder. Scalable Video Technology (SVT) is a software-based video coding technology that is highly optimized for Intel® Xeon® processors. Using the open source SVT-HEVC encoder, it is possible to spread video encoding processing across multiple Intel® Xeon® processors to achieve a real advantage of processing efficiency.
Other
517 stars 171 forks source link

Tile Support #483

Closed shalin186 closed 4 years ago

shalin186 commented 4 years ago

Hi,

I see that SVT-HEVC supports tiling for HEVC. Does it encode multiple tiles in parallel? When I try encoding using one tile vs multiple tiles I don't see any improvement in performance. Is there any plans to support encoding multiple tiles in parallel to improve the performance?

Shalin

tianjunwork commented 4 years ago

Hi @shalin186, thank you for your question. Yes, it does. Tiles are processed in parallel. Could you share your command line and master/tag? I tested briefly on master. CPU usage does increase but not perf. It may be a regression. We will look into it.

lijing0010 commented 4 years ago

SVT is already highly parallelized, even without tiling. We had picture level, block level parallelism, so enabling tiling will not help you too much on fps. But should be able to help to reduce the latency

shalin186 commented 4 years ago

@tianjunwork @lijing0010 yes I agree that it is already highly parallelized. For higher quality presets the CPU utilization is pretty good, about 80%. But when I use preset 12, I see that CPU is under-utilized. I am executing this on a 96 core cloud instance. and I see CPU utilization to be around 5600 (60%). In this type of case, I would think that parallel tile encoding would help. But in my case, I see no difference in performance with 1 tile vs 16 tiles. Here's my command line for your reference.

ffmpeg -stream_loop -1 -i test.mp4 -c:v libsvt_hevc -rc 1 -preset 12 -f null -

the input is 8K (7680x3840) resolution. I made couple of changes in libsvt_hevc.c to enable the multiple tiles. basically I am just setting param->tileColumnCount = 4; param->tileRowCount = 4;

Let me know your thoughts on this.

Shalin

tianjunwork commented 4 years ago

Hi @shalin186 thanks for the background.

shalin186 commented 4 years ago

Hi,

just wanted to check if you are going to look into what I reported. And if my expectations that higher number of tiles should help the performance in preset 12 is valid?

Shalin

tianjunwork commented 4 years ago

Hi @shalin186 , we are looking into it now and trying to find the regression. With 8k(huge data) and preset 12(fast encoding), the bottleneck is memory bandwidth. CPU is starving for data. Using tiling won't improve CPU utilization much(or at all) at preset 12. To see the difference of CPU utilization at preset 12, you can use -nb which read frames into buffer before encoding. Note, -nb is only used for testing.

shalin186 commented 4 years ago

Thanks for the explanation. As shown in my command line (ffmpeg -stream_loop -1 -i test.mp4 -c:v libsvt_hevc -rc 1 -preset 12 -f null - ), I am using ffmpeg for transcoding. I don't see a way to use -nb option. Also, would such an option even help while using ffmpeg, as I am assuming that decoded frames from ffmpeg would be in main memory. So there's not much disk access going on.

tianjunwork commented 4 years ago

yeah, -nb is only for testing purpose of svt sample app, not for any other integrated application. If you would like to see the difference of using -nb, you need to use the sample app. Anyway, with 8k -perset 12, using tiling won't help much.

shalin186 commented 4 years ago

ok I will try out -nb option in standalone app. Can you also tell me in which case will I see the benefit of using multiple tiles?

tianjunwork commented 4 years ago

Generally speaking, when CPU is not starving for data to process, e.g. encMode 0/1/2(since computation is heavy). And of cause, CPU usage is not full, there is still room to scale.

tianjunwork commented 4 years ago

Hi @shalin186 , e.g. with -hierarchical-levels 0 which means flat structure, no frame level parallelism, tile based encoding can greatly improve performance. If all existing parallelism are enabled, tile based encoding won't improve much. I confirmed there is no regression of tile base encoding on master.

70% improvement of fps.

$ ./SvtHevcEncApp -i ../../../../yuv/bbb_1920x1080_420p.yuv -w 1920 -h 1080  -n 2000 -encMode 5 -hierarchical-levels 0 -base-layer-switch-mode 1
Channel 1
Average Speed:          56.05 fps
Total Encoding Time:    35682 ms
Total Execution Time:   36172 ms
Average Latency:        2081 ms
Max Latency:            3339 ms

$ ./SvtHevcEncApp -i ../../../../yuv/bbb_1920x1080_420p.yuv -w 1920 -h 1080  -n 2000 -encMode 5 -hierarchical-levels 0 -base-layer-switch-mode 1 -tile_row_cnt 8 -tile_col_cnt 1
Channel 1
Average Speed:          96.68 fps
Total Encoding Time:    20687 ms
Total Execution Time:   21191 ms
Average Latency:        1216 ms
Max Latency:            1747 ms

5-6% improvement of fps.

$./SvtHevcEncApp -i ../../../../yuv/bbb_1920x1080_420p.yuv -w 1920 -h 1080  -n 5000 -encMode 5
Channel 1
Average Speed:          184.22 fps
Total Encoding Time:    27141 ms
Total Execution Time:   27721 ms
Average Latency:        644 ms
Max Latency:            928 ms

$./SvtHevcEncApp -i ../../../../yuv/bbb_1920x1080_420p.yuv -w 1920 -h 1080  -n 5000 -encMode 5 -tile_row_cnt 8 -tile_col_cnt 1
Channel 1
Average Speed:          195.96 fps
Total Encoding Time:    25515 ms
Total Execution Time:   26116 ms
Average Latency:        606 ms
Max Latency:            901 ms
tianjunwork commented 4 years ago

Hi @shalin186 , do you have any other questions to discuss? If not, could you close this issue?