Closed GoogleCodeExporter closed 8 years ago
Experimental is 3.297 times faster than Master.
44.980 times slower than vp8.
Version VP9 Time VP8 Time
jan14 72,628 3,958
jan20 311,193 1,989
jan24 338,425 1,939
jan26 315,218 1,809
feb23 313,918 2,170
feb23x 113,970 2,015
mar1 316,140 2,151
mar1x 96,773 2,183
mar07 317,223 1,998
mar07x 94,093 2,188
mar09 319,236 1,974
mar09x 98,094 2,266
mar11 317,829 1,981
mar11x 96,393 2,143
Using the following script:
call :timeone vpxenc_jan14.exe vp9
call :timeone vpxenc_jan20.exe vp9
call :timeone vpxenc_jan24.exe vp9
call :timeone vpxenc_jan26.exe vp9
call :timeone vpxenc_feb23.exe vp9
call :timeone vpxencx_feb23.exe vp9
call :timeone vpxenc_mar1.exe vp9
call :timeone vpxencx_mar1.exe vp9
call :timeone vpxenc_mar07.exe vp9
call :timeone vpxencx_mar07.exe vp9
call :timeone vpxenc_mar09.exe vp9
call :timeone vpxencx_mar09.exe vp9
call :timeone vpxenc_mar11.exe vp9
call :timeone vpxencx_mar11.exe vp9
goto :eof
:timeone
timex %1 -w 640 -h 360 --fps=30000/1001 --target-bitrate=400
bear.640x360_30Hz_P420.yuv -o bear0.vp9.webm -p 2 --codec=%2 --good
--cpu-used=0 --lag-in-frames=25 --min-q=0 --max-q=63 --end-usage=vbr
--auto-alt-ref=1 --kf-max-dist=9999 --kf-min-dist=0 --drop-frame=0
--static-thresh=0 --bias-pct=50 --minsection-pct=0 --maxsection-pct=2000
--arnr-maxframes=7 --arnr-strength=5 --arnr-type=3 --sharpness=0
--undershoot-pct=100 -v --psnr -t 4
goto :eof
Original comment by fbarch...@chromium.org
on 12 Mar 2013 at 3:39
VP9 bitstream is not yet finalized
Original comment by albe...@google.com
on 14 Mar 2013 at 10:27
Those working on improving vp9 quality (before its final), would benefit from
faster iteration. Making a change and testing if it helps quality, is
typically hours, if not days, to run the test.
The improvement made so far (3.2x), made a huge difference.
I'd suggest enabling threads - that shouldnt affect bit stream, and would give
an order of magnitude performance difference.
Original comment by fbarch...@chromium.org
on 15 Mar 2013 at 3:00
One more interesting thing: on my core i5 (4 cores) vp9 uses only one core (25%
cpu load shows KDE system monitor) even when I use -t 3 option.
My command line:
vpxenc $HOME/video.y4m -o "${g%.*}.webm" \
--i420 --passes=2 --pass=2 --fpf=pass.log -t 3 \
--good --cpu-used=0 --target-bitrate=1200 --auto-alt-ref=1 \
-v --codec=vp9 --end-usage=vbr --minsection-pct=5 \
--maxsection-pct=800 --lag-in-frames=16 --cpu-used=0 \
--kf-min-dist=0 --kf-max-dist=360 \
--static-thresh=0 --min-q=0 --max-q=60 & mplayer -benchmark -nofs -noframedrop -vo yuv4mpeg:file=/home/ilya/video.y4m -ass -vf harddup -nosound "$g"
Original comment by yast...@gmail.com
on 18 May 2013 at 7:31
Some progress:
Date ms/f
jan14 966
jan20 3,803
jan24 3,625
jan26 4,002
feb23 4,016
mar1. 3,962
mar07 3,936
mar09 4,121
mar11 4,082
mar14 3,984
mar23 4,030
apr26 1,294
may03 1,293
may12 1,288
jun01 1,298
jun12 12,844
jun14 9,601
jul31 2,118
aug03 2,044
aug18 2,078
aug24 1,950
VP8 is 36 ms/f on same machine/file. 54x faster.
Original comment by fbarch...@google.com
on 25 Aug 2013 at 8:33
Using Aug 29 version
On bear movie
VP8
Pass 2/2 frame 82/86 132565B 12933b/f 387607b/s 1310022 us (62.59
fps)←[K573F
Stream 0 PSNR (Overall/Avg/Y/U/V) 37.551 37.836 36.661 41.026 43.575
TIMEX 1582.00 ms (1.58 seconds)
VP9 cpu used=1
Pass 2/2 frame 82/82 129313B 12615b/f 378098b/s 14284 ms (5.74 fps)
Stream 0 PSNR (Overall/Avg/Y/U/V) 39.840 39.970 38.879 42.694 44.946
TIMEX 14463.00 ms (14.46 seconds)
cpu used = 0 109714.00 ms
cpu used = 1 15086.00 ms
cpu used = 2 4854.00 ms
cpu used = 3 3351.00 ms
cpu used = 4 2776.00 ms
Long movie
Pass 2/2 frame 4228/4203 15216752B 1743055 ms 2.43 fps [ETA 20:09:35] 49.546
48.730 51.543 52.201 1080F
Re #4 'interesting thing' - thats why I suggest enabling threads in #3. Its
still not enabled, and would be an easy change for a large win.
One idea would be a thread per tile for encoding. Tiles allow full parallelism
and just need the bitstream writes serialized.
Original comment by fbarch...@google.com
on 1 Sep 2013 at 5:33
Maybe my command line is wrong, but --cpu-used=1 (or 2 or 3) and threads=4
still doesn't work. vpxenc still uses one thread.
Original comment by yast...@gmail.com
on 16 Sep 2013 at 4:38
Re #7 Vp9 does not support threads.
Long videos still take quite awhile to encode:
cpu used=0
Pass 2/2 frame 298/273 4029642B 2922957 ms 9808.58 ms/f [ETA 527:47:26]
←[K42.323 41.067 46.737 47.617 8769F
cpu used=1
Pass 2/2 frame 10625/10600 76571391B 6799631 ms 1.56 fps [ETA 30:01:31] 44.487
43.733 45.009 48.984 117F
Original comment by fbarch...@google.com
on 17 Sep 2013 at 8:18
cpu used=0 remains a little too slow to use in practice.
After 1 day, estimate is 382 hrs = 15.91 days.
Pass 2/2 frame 10335/10310 73179543B 84133871 ms 7.37 fpm [ETA 382:46:04]
43.911 43.087 44.596 48.986 17701F
Of the 32 videos in testmatrix, 6 take more than a day
brian - Pass 2/2 frame 29924/29899 54374479B 85703619 ms 20.95 fpm [ETA
122:58:01] 42.184 41.281 44.06
garden - Pass 2/2 frame 1447/1422 58168937B 84758018 ms 1.02 fpm [ETA 21:37:40]
44.362 43.732 45.578 46.436
dance - Pass 2/2 frame 4892/4867 39493166B 85999862 ms 3.41 fpm [ETA 19:09:55]
35.564 34.479 39.807 38.670 45F
snow - Pass 2/2 frame 3355/3330 22930878B 84537280 ms 2.38 fpm [ETA 19:21:31]
25.81
red - Pass 2/2 frame 2488/2463 22099691B 84668105 ms 1.76 fpm [ETA 13:04:25]
42.609 41.213 49.847 47.933 1053F
Original comment by fbarch...@google.com
on 29 Sep 2013 at 12:49
Original comment by fgalli...@google.com
on 16 Jan 2015 at 11:53
Original issue reported on code.google.com by
fbarch...@google.com
on 25 Feb 2013 at 5:35