cdgriffith / FastFlix

FastFlix is a free GUI for HEVC and AV1 encoding, GIF/WebP/AVIF creation, and more!
MIT License
1.07k stars 54 forks source link

Performance and Progress Reporting Improvements #516

Closed w-barath closed 1 month ago

w-barath commented 10 months ago

I've authored a script for doing ffmpeg batch encoding. FastFlix does most of what my script does, but its performance and progress reporting don't quite match up due to two things:

1) Progress Reporting: when doing pass 1 of a 2 pass encode, FastFlix doesn't copy the audio channels. FFMPEG uses the audio channel to determine its location during the encode, so it reports frame zero for the entire first pass. So when doing the first pass, it's good to add -c:a copy

2) unsure why but FastFlix is using 100% of one core, competing with FFMPEG, throughout the encoding process, yet FastFlix is doing no actual work during this time. I'm guessing this is due to FastFlix doing a tight loop to poll FFMPEG's output instead of using blocking IO, which would sleep when there's been no new input. So introduce a usleep() call into that tight loop.

MarcoRavich commented 10 months ago

Dunno if can help in any way, but @porcino's Av1ador parallel encoder may inspire for a better implementation:

AV1/HEVC/VP9/H264 parallel encoder GUI for FFmpeg with preview/comparison while transcoding.

GIT: https://github.com/porcino/Av1ador

w-barath commented 10 months ago

Further, when you suspend a Fastflix encode, it keeps using 100% of one core, to do nothing

cdgriffith commented 10 months ago

Hi @w-barath thanks for the heads up about high CPU usage! Will get that fixed for next release.

For the progress report, what encoder and version of FFMpeg are you using? I am able to see progress on first pass with a few I tried (SVT-AV1, x265)

cdgriffith commented 10 months ago

CPU performance improved in 5.5.7 https://github.com/cdgriffith/FastFlix/releases/tag/5.5.7

w-barath commented 10 months ago

Maybe it's a platform-dependent thing then. Every version of ffmpeg built for Linux from 3.x to git master won't show progress without -ac copy or -c:a copy, on first pass. libaom-AV1, SVT-AV1, x265, x264

There's zero cost to copying the audio track on the first pass as it's not going to disk.

w-barath commented 10 months ago

The other performance option I should have mentioned - Handbrake has a "turbo first pass" option and FFMPEG's own howto docs on their Track site recommends doing a real-time first pass for codecs that support it. While the quality difference is non-zero, it cuts the encoding time basically in half for high-effort presets, and they claim the quality difference is imperceptible.

cdgriffith commented 10 months ago

Good to know, could you point me to that documentation for how to? I don't see that covered in either x264 or x265 two pass sections:

https://trac.ffmpeg.org/wiki/Encode/H.265#Two-PassEncoding https://trac.ffmpeg.org/wiki/Encode/H.264#twopass

w-barath commented 10 months ago

I haven't found it either, but I'll keep looking.

In the meantime, did you notice this warning at https://trac.ffmpeg.org/wiki/Encode/H.264#twopass ?

Warning: When using option -an, you may eventually get a segfault or a

broken file. If so, remove option -an and replace by -vsync cfr to the first pass.

MarcoRavich commented 10 months ago

While the quality difference is non-zero, it cuts the encoding time basically in half for high-effort presets, and they claim the quality difference is imperceptible.

As suggested in the past, it would be cool to perform an HW-accelerated 1st pass, IMHO.

w-barath commented 10 months ago

Good to know, could you point me to that documentation for how to?

I can't for the life of me find it. I get too many false positives when I use search terms around "pass" "preset" "quicker" "faster" etc...

The best I came across was this: https://www.streamingmedia.com/Articles/ReadArticle.aspx?ArticleID=154825

Note that I tested with -cpu-used 8 in the first pass and -cpu-used 4 in the second. That’s because the quality used in the first pass doesn’t impact overall quality.

They are using --cpu-used 8 / --cpu-used 4 for pass 1 and 2 respectively in a professional streaming setting, with input from Google engineers, so maybe they know what they're talking about 😉, but I get it would be nice if I could come up with actual research showing bd-rate for different first-pass presets.

Ah here's some references from Meta's video streaming engineers as well: https://engineering.fb.com/2023/02/21/video-engineering/av1-codec-facebook-instagram-reels/

In previous studies we found that we could use the high-speed preset for first-pass encoding and to produce the convex hull, and then take a second pass to encode the selected (resolution, CRF) points with the high-quality preset. Even though this approach requires additional encoding, it’s faster because the first pass can be done much more quickly. (Coding efficiency drops only slightly.)

Here in context of their studies the "drop" is well under 1%, and by "faster" they mean the same perceptual quality at the same bitrate in less time, using 2-pass, than with the best competing (ABR + CRF + preset) combination.

Hope that helps.

cdgriffith commented 10 months ago

Looks like with x265 it's a flag need to set no-slow-firstpass=1 in x265 params https://x265.readthedocs.io/en/master/cli.html#cmdoption-slow-firstpass

Also good to know can modify svt-av1 cpu settings.

What I don't want to do is blindly pick a faster preset, as that could modify an array of settings / sub parameters that could create straight up bad conversions without knowing what is safe to switch. Notice the list of things modified for x265 first pass is a lot shorter than all the options that a certain preset covers https://x265.readthedocs.io/en/master/presets.html so changing the wrong ones could be bad.

w-barath commented 10 months ago

I've been using the second fastest with excellent results when doing veryslow 2-pass encodes with x264/x265, and I've been using preset 9 with libaom-av1, 12 with libsvtav1.

You may find it interesting to find out that vp9 uses fast settings internally for pass 1.

ffpmeg trac states that SVT-AV1 preset 13 is "for debugging and running fast convex-hull encoding" which is a roundabout way of saying the only consumer use is for first pass encoding. ( https://trac.ffmpeg.org/wiki/Encode/AV1#Presetsandtunes)

If you'd like to understand the subject material better, here's Netflix's research: https://netflixtechblog.com/dynamic-optimizer-a-perceptual-video-encoding-optimization-framework-e19f1e3a277f?gi=f00599996954

And then FaceBook / Meta's paper around their findings more recently: https://research.facebook.com/publications/fast-encoding-parameter-selection-for-convex-hull-video-encoding/

I would link the PDF directly but it appears to have a time-dependent auth hash.

After reading both I'm pretty confident that my using the second-fastest preset has been overkill. I will adjust my script to always use the fastest preset for pass 1 for each codec in future.

Message ID: @.***>

teddybee commented 3 months ago

I have a strange issue too. If I start the encoding(svt av1), my processor runs around 100%, but as soon I switch to another window, it slow downs to maybe 4 threads. The encoding is also slow down to 25%. Switching back to window doesn't speed up again the encoding. Is there any settings that I can use to maintain the speed? Windows thread/app priority didn't help.

w-barath commented 2 months ago

Is there any settings that I can use to maintain the speed? Windows thread/app priority didn't help.

This Issue was to do with spin-polling the ffmpeg log file for status reporting, which was wasting one CPU thread.

Your issue isn't related to that problem (which has been fixed), so your Issue should be submitted as a new Issue. Otherwise it probably won't get identified.

MarcoRavich commented 2 months ago

@w-barath As the issue opener, you can close it too.