Open novacrazy opened 2 years ago
With the new user asset multi-versioning system, encoding GIFs to other formats is an option.
Sadly, AVIF is not ideal for animations.
WebM/VP9+Alpha works well via ffmpeg
, but is slow to encode.
ffmpeg
can also encode APNG, but that's also slow.
The png
crate appears to support APNG encoding, so that should be looked into.
A dedicated asset processing queue may be needed for WebM/APNG encoding, as any blocking solution may result in HTTP request timeouts when trying to apply profile data. Anything more than num_cpus
encoding requests may overwhelm the server without a processing queue.
ffmpeg -t 15s -i - -fpsmax 24 -filter_complex \
"crop='min(iw,ih):min(iw,ih)',scale=256:256:flags=bicubic:force_original_aspect_ratio=increase,crop=256:256,atadenoise,deflicker=size=3,mpdecimate,format=rgba,split[a][b];\
[a]palettegen=reserve_transparent=on:transparency_color=ffffff:stats_mode=diff[p];\
[b][p]paletteuse=dither=bayer:bayer_scale=4:diff_mode=rectangle:alpha_threshold=128" \
-f gif pipe:1 < input_file | gifsicle -O3 --lossy=2 > out.gif
and
ffmpeg -t 15s -i - -fpsmax 24 -filter_complex "crop='min(iw,ih):min(iw,ih)',scale=256:256:flags=bicubic:force_original_aspect_ratio=increase,crop=256:256,atadenoise,deflicker=size=3,mpdecimate" \
-c:v libvpx-vp9 -pix_fmt yuva420p -cpu-used 1 -row-mt 1 -quality realtime -crf 28 -b:v 256k -auto-alt-ref 0 -f webm pipe:1 < input_file > out.webm
are pretty close to optimal. Lantern must use the standard io streams for communication, so that's how they're designed here.
VP9 encoding is very slow, but quite efficient, and supports transparency. GIFs suck all-around, but --lossy=2
or perhaps 2-5 (maybe based on input size?) helps to trim out the low-hanging fruit and shave off a few hundred Kb.
The denoising and deflickering are to remove artifacts from previous dithering as well as a small attempt to avoid seizure-warning GIFs. My initial testing showed rather nice file size reduction once the jittering from previous dithering was removed.
These will have to be tuned for banners as well.
ffmpeg -t 15s -i - -fpsmax 24 -filter_complex "crop=iw:ow*9/16:0:0,scale=1280:720:flags=bicubic:force_original_aspect_ratio=increase,crop=1280:720,atadenoise,deflicker=size=3,mpdecimate" \
-c:v libvpx-vp9 -pix_fmt yuva420p -cpu-used 2 -row-mt 1 -quality good -crf 22 -b:v 384k -auto-alt-ref 0 -f webm pipe:1 < input_file > out.webm
ffmpeg -t 15s -i - -fpsmax 24 -filter_complex \
"crop=iw:ow*9/16:0:0,scale=640:360:flags=bicubic:force_original_aspect_ratio=increase,crop=640:360,atadenoise,deflicker=size=3,mpdecimate,format=rgba,split[a][b];\
[a]palettegen=reserve_transparent=on:transparency_color=ffffff:stats_mode=diff[p];\
[b][p]paletteuse=dither=bayer:bayer_scale=5:diff_mode=rectangle:alpha_threshold=128" \
-f gif pipe:1 < input_file | gifsicle -O3 --lossy=15 > out.gif
these can be used on banners.
Since GIFs will always be the fallback, we can avoid minutes of processing by generating the banner at half or perhaps even a quarter of the full resolution. Perhaps some kind of heuristic could be used based in the incoming file type and size.
Might also be better to increase the crf
of avatars. It's constrained by the bitrate anyway.
Both of these banner commands take 1-2.5x media duration to encode on my PC. Will be even slower on the server. This will likely mean animated banners will be a premium offering out of sheer necessity.
An additional step to detect transparency in the source media might be a good idea, probably through ffprobe
.
As it turns out, ffmpeg requires lseek
on the input for some file formats, most notably GIFs. In the above snippets, Bash was not actually using stdin to feed the input file, but rather was exposing a file descriptor. Near-identical to just opening the file normally, and not what I had intended.
So my plan of just using stdin/stdout without any intermediate file is out the window.
One solution would be to make use of tmpfs, as like the opposite of mmapping. Store the file in tmpfs and that will allow ffmpeg to seek around to its heart's content. tmpfs does not preallocate, and Docker has native support for it for easy deployment.
With tmpfs, I can also use ffprobe to make more intelligent decisions about how to encode the asset. For example, very small GIFs can suffer from re-encoding, so we can change either the dither pattern to sierra2_4a
or even reuse the original file if it's the right aspect ratio. Perhaps if the only operation is to crop then we can (probably) reuse the original palette for lossless cropping. We would have to forego deflickering on small GIFs, though, which may be acceptable. It's mostly banners that are a seizure risk given their size.
Can also generate low-quality banner GIFs using something like:
ffmpeg -t 15s -i - -fpsmax 12 -filter_complex \
"crop=iw:ow*9/16:0:0,scale=640:360:flags=bicubic:force_original_aspect_ratio=increase,crop=640:360,\
atadenoise,deflicker=size=8,mpdecimate,split[a][b];\
[a]palettegen=max_colors=96:reserve_transparent=off:stats_mode=diff[p];\
[b][p]paletteuse=dither=none:diff_mode=rectangle" \
-f gif pipe:1 < input_file | gifsicle -O3 --lossy=300 > out.gif
Fewer colors, more deflicker, lower fps, no dithering, very lossy post-processing
After more thought and experimentation, my current plans are to do the following:
If the source file's mime type is detected to be a GIF:
gif_probe
tool, instead of ffprobe
. gif_probe
is much more lightweight and will detect transparent pixels.gif_probe
b. Use yuva420p
on WebM only if the GIF had transparent pixels gifsicle
to perform the crop and optimize the GIF. This should be effectively lossless.If the source file's mime type was incorrect, but ffprobe detects it as a GIF, the system will treat it as an opaque GIF. Less work for a bad file. If the source file is not a GIF, but still has an alpha channel, ffprobe will detect that and we can choose the appropriate formats. Only GIFs require the heavy denoising filters, though.
Knowing if the GIF is actually transparent on any frame is important for the selection algorithm, as formats with alpha channel are given priority.
VP9 WebM support may be made optional, though, as it takes significantly longer to encode than all the other formats combined. It's even tempting to not use WebM if the animation is opaque.
Furthermore, it will likely be necessary to create a "job" system within the API to track the progress of encoding jobs. It may be too long to wait for a single HTTP POST request, so instead the frontend can poll the job system for a specific job ID (provided by the API), and display a progress-bar in the GUI. We can give the -progress pipe:1
parameter to ffmpeg to print out series a key-value entries that can be parsed and provided to the job system for more exact values.
GIFs for avatars and banners will be quite popular, and need to be cropped, scaled and optimized like still images.
Currently, the "standard"
image
library has rather poor GIF support, drastically increasing file size in some cases, and is also quite slow based on issues.https://github.com/ImageOptim/gifski has a better encoder, but is more focused in extracting the highest quality from video, rather than optimizing existing GIFs. Likely not a good fit.
ImageMagick may be the only solution, by spawning a subprocess and passing the image data through stdin. https://github.com/zshipko/image2-rs is notable for this, but lacks apparent GIF support. Custom routines would likely be needed.