Open stderr-to-devnull opened 2 weeks ago
I don't think this makes sense as there is no useful general crf = vmaf rule we can use in the search. A crf difference of 2 might make a very little or very large vmaf difference depending on vcodec & input.
The current algo is a binary search with linear interpolation. It starts at the middle crf but then picks half way between the middle and the min/max. Once we have a high and low crf we can interpolate.
I guess you are using much tighter min/max crf values? It might be better to go straight to the min/max edge on the 2nd iteration when the range is quite tight. This would result in faster failure in your examples.
So perhaps this search behaviour should be configurable and have a different default for when narrow crf min/max ranges are used.
there is no useful general crf = vmaf rule we can use in the search
I just showed two examples but I have so many more with the same results.
I guess you are using much tighter min/max crf values?
Yes, and that is mainly to start from a reasonable middle, if I would increase the min/max range, the CRF search would start at a too high value, wasting even more time, so tightening the range is the only manual way of tuning the CRF search to not waste time searching for obvious unfeasible values.
So perhaps this search behaviour should be configurable and have a different default for when narrow crf min/max ranges are used.
That would be a start.
I just thought of an even better and smarter way: implement a learning
algo, which would work like this:
0.5
VMAF pointsWith this method, a lot of useless CRF searching would be eliminated.
The problem with your logic in finding a mapping between CRF and VMAF is that you are neglecting that the source content plays a major role. VMAF != CRF rather, VMAF = CRF(VideoContent), if the video content changes so does the VMAF for a particular CRF.
As a whole this is a limitation with this tool, it only samples the video rather than encodes and vmaf the whole video. It's a tool for quick estimates on what a decent CRF could be for a video. Since it only takes x number of samples from the video, you can never be sure you've gotten the worst performing crf(content) combination. Which is why there is a suggestion to use the lowest vmaf found for a particular CRF value rather than the average.
The problem with your logic in finding a mapping between CRF and VMAF is that you are neglecting that the source content plays a major role.
You missed the part where I actually mention that mapping CRF to VMAF should be done within a similarity range of 0.5
VMAF points, exactly due to the fact that not all sources are the same and a differentiator is needed. Which means, within these constraints, source content is similar in complexity or details. Past 0.5
threshold, another full CRF search of the new source will be done to create a new mapping.
I have a truckload of examples of different videos that ended up having almost same CRF value for a given VMAF target.
I'm not sold on a new general approach. If you want to progress that maybe best to produce a working algorithm maybe using a script + ab-av1 sample-encode
to analyse each iteration.
I think the current binary search has already decent behaviour, minimising worst case iterations, aiming to get to the best crf in 3-4 iterations. I think getting to the answer in 1-2 iterations is only really possible if you "know" the answer already or are lucky.
I would be interested in tuning the "halfway between mid and bound" logic as that is just a heuristic based on my own analysis. There is a case to just remove that, or as mentioned before make it configurable / auto disable when crf bounds are adjusted.
I have a truckload of examples of different videos that ended up having almost same CRF value for a given VMAF target.
Perhaps then we could have a way to hint a crf. Something like arg --crf-hint 20
which would
So if you hint well you do get an answer in 1-2 iterations, but otherwise this will add 1-2 extra bad iterations to the search. With this perhaps we could skip the 2nd iter here if the 1st is "far enough" off the VMAF, like +/- 5 off or something but it's a bit of a guess.
All the proposed learning
logic does is to try out a CRF value that would be closer to the desired VMAF target score, based on the aforementioned similarity
logic. The only bad aspect about it is the fact that the learning info needs to be stored in a separate file and maybe you want to avoid that (just like with the batch processing issue that I opened).
I just thought about another way of speeding up and improving the CRF search. The way ab-av1
operates now:
But how about:
2
points above/below target VMAF score, do the following:
Alternatively, if the results may be better, instead of picking out the samples with score closest to median, maybe pick out the samples from the "edges" (lowest / highest scores).
@alexheretic I intend to publish the script on github
at one point, do you have any objections to this? See below.
Alright, I wrote a wrapper that automates a lot of things but most importantly, uses an adaptive crf-search
logic based on VMAF score deviation from desired target score, instead of the standard binary search which I kept saying is wasting a lot of time on useless CRF value runs.
ab-av1-wrapper
for the below preliminary test2.37x
faster than ab-av1
(7m41
vs 18m17s
)More testing is required with a bunch of different sources and this is what I am focused on now
There are other things happening but main takeaways are in the below example of script run, especially the logic behind the speed increase.
.\ab-av1.exe crf-search --cache false --min-vmaf=95.5 --min-crf=18 --max-crf=24 --vmaf n_subsample=3 --samples=8 --sample-duration=25s --pix-format yuv444p10le -e libx265 --preset=slow --enc x265-params="profile=main10:no-sao=1:aq-mode=4:ctu=32" -i "D:\video\test.mp4"
- crf 21 VMAF 97.67 (70%)
- crf 22.5 VMAF 96.28 (58%)
- crf 24 VMAF 94.44 (48%)
- crf 23.1 VMAF 95.61 (54%)
- crf 23.2 VMAF 95.49 (53%)
00:18:17 ########################################################################################################################################################################### (sampling crf 23.2, eta 0s)
Encode with: ab-av1 encode -e libx265 -i "D:\video\test.mp4" --crf 23.1 --preset slow --pix-format yuv444p10le --enc x265-params=profile=main10:no-sao=1:aq-mode=4:ctu=32
crf 23.1 VMAF 95.61 predicted video stream size 321.38 MiB (54%) taking 21 minutes
wsl time ./ab-av1-wrapper.sh
Provide full paths to source video files, one by line, terminate with ENTER
D:\video\test.mp4
Using following settings
------------------------
Target VMAF : 95.5
Target VMAF deviation : 0.4 overshoot, 0.2 undershoot
Target vertical resolution: 720p
CRF : using 18-24 search range (based on target vertical resolution) || using default starting CRF value of 21
x265 : using preset "slow" || using parameters "profile=main10:no-sao=1:aq-mode=4:ctu=32"
ab-av1 : using 8 initial samples, with 25s length each
Processing D:\video\test.mp4
Video CODEC: h264 || Vertical resolution: 720 || FPS: 29.000 || Video bitrate: 3.00 Mbps || Audio CODEC: aac || Audio bitrate: 96 Kbps
Source has low fps, increasing starting CRF value to 22.5 and setting VMAF n_subsamples to 3
-----------------------------|ab-av1 output|-----------------------------
- Sample 1 (56%) vmaf 94.92
- Sample 2 (55%) vmaf 97.67
- Sample 3 (58%) vmaf 95.93
- Sample 4 (58%) vmaf 94.80
- Sample 5 (61%) vmaf 97.44
- Sample 6 (58%) vmaf 96.93
- Sample 7 (60%) vmaf 96.02
- Sample 8 (58%) vmaf 96.53
00:03:46 Sample 8/8 ##################################################################################################################################################################### (sampling, eta 0s)
Encode with: ab-av1 encode -e libx265 -i "D:\video\test.mp4" --crf 22.5 --preset slow --pix-format yuv444p10le --enc x265-params=profile=main10:no-s
ao=1:aq-mode=4:ctu=32
VMAF 96.28 predicted video stream size 347.10 MiB (58%) taking 22 minutes
-------------------------------------------------------------------------
[!!] CRF 22.5 FAILED || Average VMAF score: 96.25 || VMAF deviation overshoot: 0.75 || Target VMAF deviation overshoot of 0.4
Adjusted CRF to new value: 23.02
Encoding 4 samples: 2 with lowest VMAF scores, 2 with highest VMAF scores
Sample 1 encoded in 0m 20s
Sample 1 || new VMAF score: 94.22 || new VMAF deviation from target: 1.28
Sample 2 encoded in 0m 21s
Sample 2 || new VMAF score: 94.27 || new VMAF deviation from target: 1.23
Sample 3 encoded in 0m 22s
Sample 3 || new VMAF score: 96.84 || new VMAF deviation from target: 1.34
Sample 4 encoded in 0m 20s
Sample 4 || new VMAF score: 97.28 || new VMAF deviation from target: 1.78
[OK] CRF 23.02 PASSED || Average VMAF score: 95.65 || VMAF deviation : 0.15 || Target VMAF deviation of 0.4
Continuing encoding the remaining samples
Sample 5 encoded in 0m 21s
Sample 5 || new VMAF score: 95.34 || new VMAF deviation from target: 0.16
Sample 6 encoded in 0m 22s
Sample 6 || new VMAF score: 95.50 || new VMAF deviation from target: 0
Sample 7 encoded in 0m 23s
Sample 7 || new VMAF score: 95.87 || new VMAF deviation from target: 0.37
Sample 8 encoded in 0m 19s
Sample 8 || new VMAF score: 96.40 || new VMAF deviation from target: 0.90
[OK] CRF 23.02 PASSED || Average VMAF score: 95.72 || VMAF deviation : 0.22 || Target VMAF deviation of 0.4
real 7m41.109s
user 0m0.270s
sys 0m0.542s
I intend to publish the script on github at one point, do you have any objections to this?
Absolutely not, please feel free. The project itself is MIT licensed too so you can do whatever you like with the code.
Overall I don't really follow your search logic or rather that to me the logic doesn't seem to generally apply to all videos. But if it works well for you then great.
These are
x265
tests for a targetVMAF
of95
(default):The logic I observed is: start in the middle of the min/max CRF range, then decrement by
1.5
CRF and I think the next decrement step value is0.8
.In both above examples, after the first CRF test finished, we can already see how far we are from target:
2.82
points in the first example and3.62
points in the second. By the third CRF search, we can observe that not even dropping by 3 CRF values we are not reaching the target VMAF. This makes at least the second CRF search useless before it even starts, wasting time.The idea would be to tune the CRF search jumps based on the delta values from desired target VMAF. If we look at the above examples, the logic would be:
if distance from target is higher than 2 points, next CRF search value decrement step should be at least 2.5 points
. This would get us faster to the target VMAF and avoid wasting time/CPU cycles on searching on obviously pointless values.