Lantern-chat / server

Lantern Server Backend
Other
12 stars 0 forks source link

Animated User Asset Processing #12

Open novacrazy opened 2 years ago

novacrazy commented 2 years ago

GIFs for avatars and banners will be quite popular, and need to be cropped, scaled and optimized like still images.

Currently, the "standard" image library has rather poor GIF support, drastically increasing file size in some cases, and is also quite slow based on issues.

https://github.com/ImageOptim/gifski has a better encoder, but is more focused in extracting the highest quality from video, rather than optimizing existing GIFs. Likely not a good fit.

ImageMagick may be the only solution, by spawning a subprocess and passing the image data through stdin. https://github.com/zshipko/image2-rs is notable for this, but lacks apparent GIF support. Custom routines would likely be needed.

novacrazy commented 2 years ago

With the new user asset multi-versioning system, encoding GIFs to other formats is an option.

Sadly, AVIF is not ideal for animations.

WebM/VP9+Alpha works well via ffmpeg, but is slow to encode. ffmpeg can also encode APNG, but that's also slow.

The png crate appears to support APNG encoding, so that should be looked into.

A dedicated asset processing queue may be needed for WebM/APNG encoding, as any blocking solution may result in HTTP request timeouts when trying to apply profile data. Anything more than num_cpus encoding requests may overwhelm the server without a processing queue.

novacrazy commented 1 year ago
ffmpeg -t 15s -i - -fpsmax 24 -filter_complex \
    "crop='min(iw,ih):min(iw,ih)',scale=256:256:flags=bicubic:force_original_aspect_ratio=increase,crop=256:256,atadenoise,deflicker=size=3,mpdecimate,format=rgba,split[a][b];\
    [a]palettegen=reserve_transparent=on:transparency_color=ffffff:stats_mode=diff[p];\
    [b][p]paletteuse=dither=bayer:bayer_scale=4:diff_mode=rectangle:alpha_threshold=128" \
    -f gif pipe:1 < input_file | gifsicle -O3 --lossy=2 > out.gif

and

ffmpeg -t 15s -i - -fpsmax 24 -filter_complex "crop='min(iw,ih):min(iw,ih)',scale=256:256:flags=bicubic:force_original_aspect_ratio=increase,crop=256:256,atadenoise,deflicker=size=3,mpdecimate" \
    -c:v libvpx-vp9 -pix_fmt yuva420p -cpu-used 1 -row-mt 1 -quality realtime -crf 28 -b:v 256k -auto-alt-ref 0 -f webm pipe:1 < input_file > out.webm

are pretty close to optimal. Lantern must use the standard io streams for communication, so that's how they're designed here.

VP9 encoding is very slow, but quite efficient, and supports transparency. GIFs suck all-around, but --lossy=2 or perhaps 2-5 (maybe based on input size?) helps to trim out the low-hanging fruit and shave off a few hundred Kb.

The denoising and deflickering are to remove artifacts from previous dithering as well as a small attempt to avoid seizure-warning GIFs. My initial testing showed rather nice file size reduction once the jittering from previous dithering was removed.

These will have to be tuned for banners as well.

novacrazy commented 1 year ago
ffmpeg -t 15s -i - -fpsmax 24 -filter_complex "crop=iw:ow*9/16:0:0,scale=1280:720:flags=bicubic:force_original_aspect_ratio=increase,crop=1280:720,atadenoise,deflicker=size=3,mpdecimate" \
    -c:v libvpx-vp9 -pix_fmt yuva420p -cpu-used 2 -row-mt 1 -quality good -crf 22 -b:v 384k -auto-alt-ref 0 -f webm pipe:1 < input_file > out.webm

ffmpeg -t 15s -i - -fpsmax 24 -filter_complex \
    "crop=iw:ow*9/16:0:0,scale=640:360:flags=bicubic:force_original_aspect_ratio=increase,crop=640:360,atadenoise,deflicker=size=3,mpdecimate,format=rgba,split[a][b];\
    [a]palettegen=reserve_transparent=on:transparency_color=ffffff:stats_mode=diff[p];\
    [b][p]paletteuse=dither=bayer:bayer_scale=5:diff_mode=rectangle:alpha_threshold=128" \
    -f gif pipe:1 < input_file | gifsicle -O3 --lossy=15 > out.gif

these can be used on banners.

Since GIFs will always be the fallback, we can avoid minutes of processing by generating the banner at half or perhaps even a quarter of the full resolution. Perhaps some kind of heuristic could be used based in the incoming file type and size.

Might also be better to increase the crf of avatars. It's constrained by the bitrate anyway.

Both of these banner commands take 1-2.5x media duration to encode on my PC. Will be even slower on the server. This will likely mean animated banners will be a premium offering out of sheer necessity.

novacrazy commented 1 year ago

An additional step to detect transparency in the source media might be a good idea, probably through ffprobe.

novacrazy commented 1 year ago

As it turns out, ffmpeg requires lseek on the input for some file formats, most notably GIFs. In the above snippets, Bash was not actually using stdin to feed the input file, but rather was exposing a file descriptor. Near-identical to just opening the file normally, and not what I had intended.

So my plan of just using stdin/stdout without any intermediate file is out the window.

One solution would be to make use of tmpfs, as like the opposite of mmapping. Store the file in tmpfs and that will allow ffmpeg to seek around to its heart's content. tmpfs does not preallocate, and Docker has native support for it for easy deployment.

With tmpfs, I can also use ffprobe to make more intelligent decisions about how to encode the asset. For example, very small GIFs can suffer from re-encoding, so we can change either the dither pattern to sierra2_4a or even reuse the original file if it's the right aspect ratio. Perhaps if the only operation is to crop then we can (probably) reuse the original palette for lossless cropping. We would have to forego deflickering on small GIFs, though, which may be acceptable. It's mostly banners that are a seizure risk given their size.

novacrazy commented 1 year ago

Can also generate low-quality banner GIFs using something like:

ffmpeg -t 15s -i - -fpsmax 12 -filter_complex \
    "crop=iw:ow*9/16:0:0,scale=640:360:flags=bicubic:force_original_aspect_ratio=increase,crop=640:360,\
        atadenoise,deflicker=size=8,mpdecimate,split[a][b];\
    [a]palettegen=max_colors=96:reserve_transparent=off:stats_mode=diff[p];\
    [b][p]paletteuse=dither=none:diff_mode=rectangle" \
    -f gif pipe:1 < input_file | gifsicle -O3 --lossy=300 > out.gif

Fewer colors, more deflicker, lower fps, no dithering, very lossy post-processing

novacrazy commented 1 year ago

After more thought and experimentation, my current plans are to do the following:

If the source file's mime type is detected to be a GIF:

  1. Probe the gif with the new gif_probe tool, instead of ffprobe. gif_probe is much more lightweight and will detect transparent pixels.
  2. Using ffmpeg, denoise/filter the GIF and encode it to VP9 WebM (2-pass), h264 MP4, and WebP a. Only target WebP if the GIF had transparent pixels, detected by gif_probe b. Use yuva420p on WebM only if the GIF had transparent pixels
  3. If the original GIF can be cropped to fit the desired dimensions and aspect ratio without scaling, so as to look identical to the video formats, use gifsicle to perform the crop and optimize the GIF. This should be effectively lossless.

If the source file's mime type was incorrect, but ffprobe detects it as a GIF, the system will treat it as an opaque GIF. Less work for a bad file. If the source file is not a GIF, but still has an alpha channel, ffprobe will detect that and we can choose the appropriate formats. Only GIFs require the heavy denoising filters, though.

Knowing if the GIF is actually transparent on any frame is important for the selection algorithm, as formats with alpha channel are given priority.

VP9 WebM support may be made optional, though, as it takes significantly longer to encode than all the other formats combined. It's even tempting to not use WebM if the animation is opaque.

Furthermore, it will likely be necessary to create a "job" system within the API to track the progress of encoding jobs. It may be too long to wait for a single HTTP POST request, so instead the frontend can poll the job system for a specific job ID (provided by the API), and display a progress-bar in the GUI. We can give the -progress pipe:1 parameter to ffmpeg to print out series a key-value entries that can be parsed and provided to the job system for more exact values.