Xilinx / video-sdk

https://xilinx.github.io/video-sdk
Other
30 stars 14 forks source link

Very low resolution encoding #49

Closed adeelabbas closed 1 year ago

adeelabbas commented 1 year ago

Hi, I am trying to encode very low resolution (270p) video using HW encoding and trying to get faster encoding than software encoding on c6a.8x using x264. My command is as follows:

ffmpeg -y \
 -ss 00:07:59.959 -t 00:03:01.772 -i VideoA.mp4 \
 -ss 00:07:59.959 -t 00:03:01.772 -i VideoB.mp4 \
 -ss 00:07:59.959 -t 00:03:01.772 -i VideoC.mp4 \
 -ss 00:07:59.959 -t 00:03:01.772 -i Michelle.m4a \
 -ss 00:07:59.959 -t 00:03:01.772 -i VideoA.m4a \
 -ss 00:07:59.959 -t 00:03:01.772 -i VideoB.m4a \
 -ss 00:07:59.959 -t 00:03:01.772 -i VideoC.m4a \
-filter_complex " \
[0:v]crop=854:1440:573:0,scale=160:270:flags=fast_bilinear,setpts=PTS-STARTPTS[vatom0_0];[1:v]crop=1080:1822:0:98,scale=160:270:flags=fast_bilinear,setpts=PTS-STARTPTS[vatom0_1];[2:v]crop=854:1440:691:0,scale=160:270:flags=fast_bilinear,setpts=PTS-STARTPTS[vatom0_2]; [vatom0_0][vatom0_1][vatom0_2]hstack=inputs=3[vout0]" -map "[vout0]" -filter_complex_threads 4 \
-filter_complex " \
[3:a][4:a][5:a][6:a]amix=inputs=4,asetpts=PTS-STARTPTS[aout0]" -map "[aout0]" \
  -c:v mpsoc_vcu_h264 -g 144 -b:v 544320 -bf 1 -spatial-aq 1 -temporal-aq 1 -tag:v avc1 -color_primaries bt709 -color_trc bt709 -colorspace smpte170m -ss 00:00:00.041 output-00003.MP4

I am getting about 180 fps on vt1.6xlarge instance. On the c6a.8xlarge, I get about 360 fps using ultrafast encoding preset.

Is there any way to run Xilinx HW encoder much faster (I dont care about quality but want to run it as fast as possible)

NastoohX commented 1 year ago

Hi,

Thank you for bringing this issue to our attention. This seems to be due to CPU filtering portion of the pipeline, e.g., running

ffmpeg -y -f lavfi -i testsrc=duration=60:size=480x270:rate=60 -c:v mpsoc_vcu_h264 -f mp4 /dev/null

gives upwards of 840 fps:

Output #0, mp4, to '/dev/null': Metadata: encoder : Lavf58.76.100 Stream #0:0: Video: h264 (avc1 / 0x31637661), nv12(tv, progressive), 480x270 [SAR 1:1 DAR 16:9], q=2-31, 5000 kb/s, 60 fps, 15360 tbn Metadata: encoder : Lavc58.134.100 mpsoc_vcu_h264 frame= 3600 fps=847 q=-0.0 Lsize= 3132kB time=00:00:59.96 bitrate= 427.9kbits/s speed=14.1x
video:3098kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.099865

Addition of the filtering; however, seems to limit encoder's input throughput, for both libx264 and mpsoc_vcu_h264. Running the following simplified pipeline:

ffmpeg -y -f lavfi -i testsrc=duration=60:size=2560x1440:rate=60 -f lavfi -i testsrc=duration=60:size=2560x1440:rate=60 -f lavfi -i testsrc=duration=60:size=2560x1440:rate=60 -filter_complex "[0:v]crop=854:1440:573:0,scale=160:270:flags=fast_bilinear,setpts=PTS-STARTPTS[vatom0_0];[1:v]crop=1822:1080:0:98,scale=160:270:flags=fast_bilinear,setpts=PTS-STARTPTS[vatom0_1];[2:v]crop=854:1440:691:0,scale=160:270:flags=fast_bilinear,setpts=PTS-STARTPTS[vatom0_2]; [vatom0_0][vatom0_1][vatom0_2]hstack=inputs=3[vout0]" -map "[vout0]" -filter_complex_threads 32 -r $RATE -c:v $ENC -f mp4 /dev/null

, where RATE=60 and ENC is set to either libx264 or mpsoc_vcu_h264, generates fps of ~ 120:

Output #0, mp4, to '/dev/null': Metadata: encoder : Lavf59.16.100 Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv444p(progressive), 480x270 [SAR 1281:1280 DAR 427:240], q=2-31, 60 fps, 15360 tbn Metadata: encoder : Lavc59.18.100 libx264 Side data: cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A frame= 896 fps=126 q=-1.0 Lsize= 112kB time=00:00:14.88 bitrate= 61.5kbits/s speed=2.09x

and

EXE: /opt/xilinx/ffmpeg/bin/ffmpeg [XMA] WARNING: ffmpeg xma-vcu-encoder device warning: !! The specified Level is too low and will be adjusted !!

Output #0, mp4, to '/dev/null': Metadata: encoder : Lavf58.76.100 Stream #0:0: Video: h264 (avc1 / 0x31637661), nv12(progressive), 480x270 [SAR 1281:1280 DAR 427:240], q=2-31, 5000 kb/s, 60 fps, 15360 tbn (default) Metadata: encoder : Lavc58.134.100 mpsoc_vcu_h264 frame= 655 fps=112 q=-0.0 Lsize= 675kB time=00:00:10.88 bitrate= 508.2kbits/s speed=1.86x

Increasing the throughput by setting r to higher values, e.g., 240, in both cases, results in ~480 fps:

Output #0, mp4, to '/dev/null': Metadata: encoder : Lavf59.16.100 Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv444p(progressive), 480x270 [SAR 1281:1280 DAR 427:240], q=2-31, 240 fps, 15360 tbn Metadata: encoder : Lavc59.18.100 libx264 Side data: cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A More than 1000 frames duplicated 0kB time=00:00:04.63 bitrate= 0.1kbits/s dup=876 drop=0 speed=1.85x
frame= 3100 fps=489 q=-1.0 Lsize= 274kB time=00:00:12.90 bitrate= 173.9kbits/s dup=2325 drop=0 speed=2.04x

and

Output #0, mp4, to '/dev/null': Metadata: encoder : Lavf58.76.100 Stream #0:0: Video: h264 (avc1 / 0x31637661), nv12(progressive), 480x270 [SAR 1281:1280 DAR 427:240], q=2-31, 5000 kb/s, 240 fps, 15360 tbn (default) Metadata: encoder : Lavc58.134.100 mpsoc_vcu_h264 More than 1000 frames duplicated 512kB time=00:00:04.80 bitrate= 872.4kbits/s dup=870 drop=0 speed=1.91x
frame= 8960 fps=468 q=-0.0 Lsize= 4261kB time=00:00:37.32 bitrate= 935.3kbits/s dup=6720 drop=0 speed=1.95x

If possible, kindly, share:

Cheers,

NastoohX commented 1 year ago

Hi, Further to the above, one of our goals with U30 has been high density and high VQ real-time processing . Our current suggestion for as fast as possible encoding, for VOD assets, is to spilt the file and transcode all segments in parallel. (Similar to https://github.com/Xilinx/video-sdk/blob/v2.0/examples/ffmpeg/tutorials/13_ffmpeg_transcode_only_split_stitch.py) Hope this helps. Cheers,