Tune CRF search logic - Githubissues

stderr-to-devnull commented 2 weeks ago

These are x265 tests for a target VMAF of 95 (default):

- crf 22 VMAF 92.18 (5%)
- crf 20.5 VMAF 93.74 (6%)
- crf 19 VMAF 94.98 (7%)

- crf 20 VMAF 91.38 (4%)
- crf 18.5 VMAF 93.03 (5%)
- crf 17 VMAF 94.40 (6%)

The logic I observed is: start in the middle of the min/max CRF range, then decrement by 1.5 CRF and I think the next decrement step value is 0.8.

In both above examples, after the first CRF test finished, we can already see how far we are from target: 2.82 points in the first example and 3.62 points in the second. By the third CRF search, we can observe that not even dropping by 3 CRF values we are not reaching the target VMAF. This makes at least the second CRF search useless before it even starts, wasting time.

The idea would be to tune the CRF search jumps based on the delta values from desired target VMAF. If we look at the above examples, the logic would be: if distance from target is higher than 2 points, next CRF search value decrement step should be at least 2.5 points. This would get us faster to the target VMAF and avoid wasting time/CPU cycles on searching on obviously pointless values.

alexheretic commented 2 weeks ago

I don't think this makes sense as there is no useful general crf = vmaf rule we can use in the search. A crf difference of 2 might make a very little or very large vmaf difference depending on vcodec & input.

The current algo is a binary search with linear interpolation. It starts at the middle crf but then picks half way between the middle and the min/max. Once we have a high and low crf we can interpolate.

I guess you are using much tighter min/max crf values? It might be better to go straight to the min/max edge on the 2nd iteration when the range is quite tight. This would result in faster failure in your examples.

So perhaps this search behaviour should be configurable and have a different default for when narrow crf min/max ranges are used.

stderr-to-devnull commented 2 weeks ago

there is no useful general crf = vmaf rule we can use in the search

I just showed two examples but I have so many more with the same results.

I guess you are using much tighter min/max crf values?

Yes, and that is mainly to start from a reasonable middle, if I would increase the min/max range, the CRF search would start at a too high value, wasting even more time, so tightening the range is the only manual way of tuning the CRF search to not waste time searching for obvious unfeasible values.

So perhaps this search behaviour should be configurable and have a different default for when narrow crf min/max ranges are used.

That would be a start.

I just thought of an even better and smarter way: implement a learning algo, which would work like this:

initial CRF search on a source, with specific scaling and encoding options and a target VMAF
retain CRF search results (VMAF scores for each CRF search values)
another source; after initial CRF search on this new source:
- compare to past CRF search results for the same settings used
- for this new source, observe similar VMAF score for the same CRF search value in the past -> the similarity being +/- 0.5 VMAF points
- from the above comparison and establishing similarity, infer how future CRF search values would influence VMAF score
- from above inference, choose a more suitable CRF search value which would be much closer to truth

With this method, a lot of useless CRF searching would be eliminated.

lovelytwo commented 2 weeks ago

The problem with your logic in finding a mapping between CRF and VMAF is that you are neglecting that the source content plays a major role. VMAF != CRF rather, VMAF = CRF(VideoContent), if the video content changes so does the VMAF for a particular CRF.

As a whole this is a limitation with this tool, it only samples the video rather than encodes and vmaf the whole video. It's a tool for quick estimates on what a decent CRF could be for a video. Since it only takes x number of samples from the video, you can never be sure you've gotten the worst performing crf(content) combination. Which is why there is a suggestion to use the lowest vmaf found for a particular CRF value rather than the average.

stderr-to-devnull commented 2 weeks ago

The problem with your logic in finding a mapping between CRF and VMAF is that you are neglecting that the source content plays a major role.

You missed the part where I actually mention that mapping CRF to VMAF should be done within a similarity range of 0.5 VMAF points, exactly due to the fact that not all sources are the same and a differentiator is needed. Which means, within these constraints, source content is similar in complexity or details. Past 0.5 threshold, another full CRF search of the new source will be done to create a new mapping.

I have a truckload of examples of different videos that ended up having almost same CRF value for a given VMAF target.

alexheretic commented 2 weeks ago

I'm not sold on a new general approach. If you want to progress that maybe best to produce a working algorithm maybe using a script + ab-av1 sample-encode to analyse each iteration.

I think the current binary search has already decent behaviour, minimising worst case iterations, aiming to get to the best crf in 3-4 iterations. I think getting to the answer in 1-2 iterations is only really possible if you "know" the answer already or are lucky.

I would be interested in tuning the "halfway between mid and bound" logic as that is just a heuristic based on my own analysis. There is a case to just remove that, or as mentioned before make it configurable / auto disable when crf bounds are adjusted.

I have a truckload of examples of different videos that ended up having almost same CRF value for a given VMAF target.

Perhaps then we could have a way to hint a crf. Something like arg --crf-hint 20 which would

1st iter: Use crf-hint value instead of mid-point for first iteration.
2nd iter: Use 1 crf-increment up/down (e.g. assume crf-hint is close to the answer)
3rd,4th ...: continue with existing search logic

So if you hint well you do get an answer in 1-2 iterations, but otherwise this will add 1-2 extra bad iterations to the search. With this perhaps we could skip the 2nd iter here if the 1st is "far enough" off the VMAF, like +/- 5 off or something but it's a bit of a guess.

stderr-to-devnull commented 2 weeks ago

All the proposed learning logic does is to try out a CRF value that would be closer to the desired VMAF target score, based on the aforementioned similarity logic. The only bad aspect about it is the fact that the learning info needs to be stored in a separate file and maybe you want to avoid that (just like with the batch processing issue that I opened).

I just thought about another way of speeding up and improving the CRF search. The way ab-av1 operates now:

generate the encoding samples
then encode each one
perform VMAF analysis and store the score
after all samples are processed, do a median VMAF score
repeat for each CRF value

But how about:

for FIRST CRF search, do the normal logic
IF after first CRF search, the median VMAF score is... 2 points above/below target VMAF score, do the following:
- pick out half of the generated samples that had the closest VMAF score to the median score from the initial CRF search and encode + calculate VMAF score only for those (for odd number of samples, round down)
- use the new median VMAF score of the selected samples to perform the next CRF search decision
- for the new CRF value, first use the selected samples and if the VMAF median score approaches the target, continue with processing the remaining samples and re-calculate the median score

Alternatively, if the results may be better, instead of picking out the samples with score closest to median, maybe pick out the samples from the "edges" (lowest / highest scores).

stderr-to-devnull commented 2 days ago

@alexheretic I intend to publish the script on github at one point, do you have any objections to this? See below.

Alright, I wrote a wrapper that automates a lot of things but most importantly, uses an adaptive crf-search logic based on VMAF score deviation from desired target score, instead of the standard binary search which I kept saying is wasting a lot of time on useless CRF value runs.

Main takeaways of `ab-av1-wrapper` for the below preliminary test

2.37x faster than ab-av1 (7m41 vs 18m17s)
user prompt to paste video files to process
tunable VMAF score acceptable undershoot and overshoot
automatically chooses a CRF range based on target vertical resolution
automatically adjusts default starting CRF value based on source FPS
automatically resizes based on target vertical resolution
CRF search logic based on initial VMAF score deviation from target
tunable CRF value adjustment based on percentage of deviation value (might make this adaptable too in the future based on observations)

More testing is required with a bunch of different sources and this is what I am focused on now

There are other things happening but main takeaways are in the below example of script run, especially the logic behind the speed increase.

Preliminary results

ab-av1

.\ab-av1.exe crf-search --cache false --min-vmaf=95.5 --min-crf=18 --max-crf=24 --vmaf n_subsample=3 --samples=8 --sample-duration=25s --pix-format yuv444p10le -e libx265 --preset=slow --enc x265-params="profile=main10:no-sao=1:aq-mode=4:ctu=32" -i "D:\video\test.mp4"
- crf 21 VMAF 97.67 (70%)
- crf 22.5 VMAF 96.28 (58%)
- crf 24 VMAF 94.44 (48%)
- crf 23.1 VMAF 95.61 (54%)
- crf 23.2 VMAF 95.49 (53%)
  00:18:17 ########################################################################################################################################################################### (sampling crf 23.2, eta 0s)
Encode with: ab-av1 encode -e libx265 -i "D:\video\test.mp4" --crf 23.1 --preset slow --pix-format yuv444p10le --enc x265-params=profile=main10:no-sao=1:aq-mode=4:ctu=32

crf 23.1 VMAF 95.61 predicted video stream size 321.38 MiB (54%) taking 21 minutes

ab-av1-wrapper

wsl time ./ab-av1-wrapper.sh

Provide full paths to source video files, one by line, terminate with ENTER

D:\video\test.mp4

Using following settings
------------------------
Target VMAF               : 95.5
Target VMAF deviation     : 0.4 overshoot, 0.2 undershoot
Target vertical resolution: 720p
CRF                       : using 18-24 search range (based on target vertical resolution) || using default starting CRF value of 21
x265                      : using preset "slow" || using parameters "profile=main10:no-sao=1:aq-mode=4:ctu=32"
ab-av1                    : using 8 initial samples, with 25s length each

Processing D:\video\test.mp4

    Video CODEC: h264 || Vertical resolution: 720 || FPS: 29.000 || Video bitrate: 3.00 Mbps || Audio CODEC: aac || Audio bitrate: 96 Kbps

    Source has low fps, increasing starting CRF value to 22.5 and setting VMAF n_subsamples to 3

-----------------------------|ab-av1 output|-----------------------------

- Sample 1 (56%) vmaf 94.92
- Sample 2 (55%) vmaf 97.67
- Sample 3 (58%) vmaf 95.93
- Sample 4 (58%) vmaf 94.80
- Sample 5 (61%) vmaf 97.44
- Sample 6 (58%) vmaf 96.93
- Sample 7 (60%) vmaf 96.02
- Sample 8 (58%) vmaf 96.53
  00:03:46 Sample 8/8 ##################################################################################################################################################################### (sampling,     eta 0s)
Encode with: ab-av1 encode -e libx265 -i "D:\video\test.mp4" --crf 22.5 --preset slow --pix-format yuv444p10le --enc x265-params=profile=main10:no-s
ao=1:aq-mode=4:ctu=32

VMAF 96.28 predicted video stream size 347.10 MiB (58%) taking 22 minutes

-------------------------------------------------------------------------

    [!!] CRF 22.5 FAILED || Average VMAF score: 96.25 || VMAF deviation overshoot: 0.75 || Target VMAF deviation overshoot of 0.4
    Adjusted CRF to new value: 23.02
    Encoding 4 samples: 2 with lowest VMAF scores, 2 with highest VMAF scores

    Sample 1 encoded in 0m 20s
    Sample 1 || new VMAF score: 94.22 || new VMAF deviation from target: 1.28
    Sample 2 encoded in 0m 21s
    Sample 2 || new VMAF score: 94.27 || new VMAF deviation from target: 1.23
    Sample 3 encoded in 0m 22s
    Sample 3 || new VMAF score: 96.84 || new VMAF deviation from target: 1.34
    Sample 4 encoded in 0m 20s
    Sample 4 || new VMAF score: 97.28 || new VMAF deviation from target: 1.78

    [OK] CRF 23.02 PASSED || Average VMAF score: 95.65 || VMAF deviation : 0.15 || Target VMAF deviation  of 0.4

    Continuing encoding the remaining samples

    Sample 5 encoded in 0m 21s
    Sample 5 || new VMAF score: 95.34 || new VMAF deviation from target: 0.16
    Sample 6 encoded in 0m 22s
    Sample 6 || new VMAF score: 95.50 || new VMAF deviation from target: 0
    Sample 7 encoded in 0m 23s
    Sample 7 || new VMAF score: 95.87 || new VMAF deviation from target: 0.37
    Sample 8 encoded in 0m 19s
    Sample 8 || new VMAF score: 96.40 || new VMAF deviation from target: 0.90

    [OK] CRF 23.02 PASSED || Average VMAF score: 95.72 || VMAF deviation : 0.22 || Target VMAF deviation  of 0.4

real    7m41.109s
user    0m0.270s
sys     0m0.542s

alexheretic commented 19 hours ago

I intend to publish the script on github at one point, do you have any objections to this?

Absolutely not, please feel free. The project itself is MIT licensed too so you can do whatever you like with the code.

Overall I don't really follow your search logic or rather that to me the logic doesn't seem to generally apply to all videos. But if it works well for you then great.

alexheretic / ab-av1

Tune CRF search logic #227

Main takeaways of `ab-av1-wrapper` for the below preliminary test

Preliminary results

ab-av1

ab-av1-wrapper

alexheretic / ab-av1

Tune CRF search logic #227

Main takeaways of ab-av1-wrapper for the below preliminary test

Preliminary results

ab-av1

ab-av1-wrapper

Main takeaways of `ab-av1-wrapper` for the below preliminary test