SixLabors / ImageSharp

:camera: A modern, cross-platform, 2D Graphics library for .NET
https://sixlabors.com/products/imagesharp/
Other
7.37k stars 850 forks source link

Poor performance creating animated gifs on Android #757

Closed dmanning23 closed 4 years ago

dmanning23 commented 5 years ago

Prerequisites

Description

I'm getting really poor performance using ImageSharp to create animated gifs on Android devices. A rundown of the use case here:

I've tried a few things including bumping up the Java heap size to 2G, which was a massive improvement. Going higher doesn't improve performance though. I also updated the Supported Architectures to build for 64bit devices because the documentation mentioned it. It offered a little bit of improvement. Also switching from the "beta" to "dev" build of ImageSharp was significantly faster.

Depending on the horsepower of the device, the ImageSharp image processing takes between 10-30 seconds. My test devices here are a Samsung S7 Edge, which takes between 10-15 seconds, and a Nexus9 which takes 20-30 seconds.

All this stuff is open-source if yall want to check my work. The ImageSharp gif creation starts at: https://github.com/dmanning23/MonogameScreenTools/blob/ebc1357f3940dac25f33d0b26096e1ef34b78919/MonogameScreenTools/MonogameScreenTools.SharedProject/GifHelper.cs#L84

The sample app I'm using to benchmark this code is at https://github.com/dmanning23/MonogameScreenToolsExample (If all yall would rather test from an APK I can upload that too.)

I've got no problem making the user wait 30 seconds to create a gif... I feel like people are used to waiting a little bit while uploading a video to Facebook or Youtube, so I'm not too worried about. Also this is an update to an app that has like 10 total downloads, so yeah zero fs given ;) I'd like to reuse this code in future apps though, with (hopefully!) more players, and obviously they will be more inclined to share if they only have to wait a few seconds for the gif to be created.

I've uploaded a sample of one of these animated gifs if that will help

monogamescreentoolstest_131852110485108910

Thanks again yall! Cheers!

System Configuration

The two test devices I have:

JimBobSquarePants commented 5 years ago

Hi @dmanning23

Thanks for filling in so much data when opening the issues. We appreciate that!

Your performance problems will be due to a lack of SIMD support on Android devices.

https://stackoverflow.com/questions/50433924/xamarin-no-hardware-acceleration-when-deployin-to-device/50434306#50434306

That's a real pain for us at the moment because our code is really starting to perform well on devices where hardware acceleration is supported and this puts us in a bad light.

However, the future is starting to look a little brighter. Mono are adding the System.Runtime.Intrinsics namespace to their framework which as I understand it comes with hardware acceleration for ARM devices.

https://github.com/mono/mono/issues/7711.

antonfirsov commented 5 years ago

@dmanning23 if you experienced better performance in earlier versions, it's most likely because we used a different quantizer as default. (@JimBobSquarePants: was it WuQuantizer?)

You can try setting a different quantizer. By tuning GifEncoder and quantizer parameters it might be possible to find a reasonalbe tradeoff for image quality VS encoder speed.

Lack of SIMD support on Xamarin is a big blocker for performance improvements on mobile, but gif is somewhat special, because the current OctreeQuantizer implementation is sub-optimal in general. We should improve this, but this is a huge task, and it is unlikely we can manage it before 1.0.

JimBobSquarePants commented 5 years ago

@antonfirsov I think you're reading it backwards. Dev is faster.

Also switching from the "beta" to "dev" build of ImageSharp was significantly faster.

Octree won't be the bottleneck there, even on an unsupported platform. I think the speed issue is simply more noticable since there's multiple frames to process. I would actually always recommend this as the default gif quantizer unless you have a dedicated palette you would like to use, though we use the palette generated by the first frame for subsequent frames by default in dev..

The Wu Quantizer supports multiple transparency values which is great for png. It's also much more memory hungry and intensive than Octree. It's also actually a little slower in dev now than in the beta as it uses half the memory but is much better at reducing the color palette.

antonfirsov commented 5 years ago

@JimBobSquarePants there was a statement in #752 about an old alpha being faster than later releases. It might be of course inaccurate, but I reacted to that in my comment.

OctreeQuantizer is good stuff but there are big optimization oppurtunities for it (based on benchmarks I did earlier + my understanding of the code). I'm not sure how does it compare to other quantizers, it was just a hint to try them, but I may be wrong.

JimBobSquarePants commented 5 years ago

@antonfirsov I doubt there's anything you couldn't speed up! 😄

dmanning23 commented 5 years ago

Yeah that was me in #752, the old alpha was quite a bit faster creating animated gifs on Android. I'll look into swapping out quantizers and see if that speeds things up for now.

JimBobSquarePants commented 5 years ago

@dmanning23 That'll be because there was no dithering in the alphas which slows things down a lot but also dramatically increases quality. If you don't need it though turn it off by passing a custom IQuantizer instance.

dmanning23 commented 5 years ago

Ok yeah it is definitely the dithering that kills performance on Android. If I encode the gif with new OctreeQuantizer(false) to turn off dithering, it only takes 2 seconds to encode the gif. The same gif takes over a minute with the default new OctreeQuantizer().

The quality is complete pants, but with that kind of performance I can render at a higher resolution and it actually doesn't look too bad.

JimBobSquarePants commented 5 years ago

I managed to improve performance of the dithering algorithm previously, I'll have another look to see if there is more low hanging fruit I can trim.

jackmott commented 5 years ago

I noticed animated gifs were a bit slow to generate on Windows as well. I wonder if given the frame by frame nature of the task if having an option to parallelize animated gifs would make sense?

JimBobSquarePants commented 5 years ago

@jackmott Is this with the beta5 build or the nightlies? They should be a lot faster.

Trouble I'm having here is that I have nothing to compare against since System.Drawing doesn't natively handle animated gifs.

Dithering is the bottleneck. I've optimized it a lot but the complexity of the algorithm slows things down.

jackmott commented 5 years ago

@JimBobSquarePants beta5. It would be pretty snappy if it used all 8 logical cores. I'll see if I can try the nightlies.

JimBobSquarePants commented 5 years ago

@jackmott It would have to be per-frame parallelization since I cannot figure out how to parallelize the sequential error diffusion algorithm. Here's the only non-firewalled article I can find on the subject.

https://community.arm.com/graphics/b/blog/posts/when-parallelism-gets-tricky-accelerating-floyd-steinberg-on-the-mali-gpu

If you could figure that out it would no longer be a bottleneck since without dithering the output performance if pretty quick.

Defo use the dev builds btw. They're solid and much faster.

JimBobSquarePants commented 5 years ago

Think I might have just figured out a way to reduce the amount of dithering we do yet keep good quality.

JimBobSquarePants commented 4 years ago

@dmanning23 Finally managed to have a proper look at this. Roughly a 10x speedup in error diffusion coming soon.

dmanning23 commented 4 years ago

Awesome! Yeah I saw your tweet this morning, looks cool. I'll check out the changes soon.

antonfirsov commented 4 years ago

Are we sure error diffusion is the only bottleneck here? (I will profile the encoding in the next few days.)

JimBobSquarePants commented 4 years ago

It’s the difference between minutes and seconds so big enough for me to focus on.

JimBobSquarePants commented 4 years ago

@dmanning23 Spoke too soon. Wasn't pushing the error to below pixels properly so pixels were getting skipped.

I've managed a minor speedup and halved the memory usage.

I don't think I can make it any faster as error diffusion is a cache mess as it touches offset and previous pixels per pixel and cannot be made parallel.

If anyone reading this does fancy profiling gif encoding with and without error diffusion and has some ideas please let me know.