SixLabors / ImageSharp

:camera: A modern, cross-platform, 2D Graphics library for .NET
https://sixlabors.com/products/imagesharp/
Other
7.45k stars 853 forks source link

Loading a GIF results in a bigger file #2198

Closed sescandell closed 1 year ago

sescandell commented 2 years ago

Prerequisites

ImageSharp version

2.1.3

Other ImageSharp packages and versions

None, made a simple monofile sample repo in .Net6

Environment (Operating system, version and so on)

Windows 11, .Net6 (but not limited to this framework)

.NET Framework version

.Net4.8, netcore3.1, net6

Description

Hi,

I made a dummy test: load a GIF file, then save it somewhere else without processing it. The final file has a bigger size than the original (more than twice its size, but variable depending on the source. I observed a 5x on another image).

Am I missing an option on save?

Thanks,

Steps to Reproduce

Here is a fully sample working test (.Net6, dependency : Image.Sharp v.2.1.3)

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net6.0</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="SixLabors.ImageSharp" Version="2.1.3" />
  </ItemGroup>

  <ItemGroup>
    <None Update="sample.gif">
      <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
    </None>
  </ItemGroup>

</Project>
// program.cs
using SixLabors.ImageSharp;

Console.WriteLine("Cloning...");

Directory.CreateDirectory("results");
using var src = File.OpenRead("sample.gif");
File.Delete("results/output.gif");
using var fs = File.OpenWrite("results/output.gif");

using var image = await Image.LoadAsync(src);
await image.SaveAsGifAsync(fs);

await fs.FlushAsync();

Console.WriteLine("Done");
Console.WriteLine($"SrcSize: {src.Length / 1024}");
Console.WriteLine($"OutputSize: {fs.Length / 1024}");
Console.ReadKey();

Results sample:

Cloning... Done SrcSize: 1819 OutputSize: 3886

Images

I'm using a zip folder to be sure Github doesn't transform the file himself, sample.zip

JimBobSquarePants commented 2 years ago

Do you have the source of those gifs or are they just random? A very high level comparison suggests that our LZW compression isn't as good as it should be. The uncompressed image data is the same size for both input and output but much larger in ours.

Input Compressed size 88.46 KiB Uncompressed size 284.77 KiB

Output Compressed size 126.25 KiB Uncompressed size 284.77 KiB

sescandell commented 2 years ago

Hi @JimBobSquarePants

It's a mix of random GIFs from Giphy for example and some made with the tool Canva

JimBobSquarePants commented 2 years ago

Thanks. I'll have a dig into our LZW encoder

brianpopow commented 2 years ago

This is only a guess, but maybe the issue is not the LZW encoder and rather the how the quantization is done? I have checked the provided test image. It has only 253 color. Maybe in those cases (< 256 colors), we should not quantize the image again and go with the colors the image has.

JimBobSquarePants commented 2 years ago

Interesting… I thought both palettes were full.

I can do a test by encoding using the input palette to confirm

JimBobSquarePants commented 2 years ago

Looks like there is an issue with compression.

I'm using a cool little app called gifiddle to compare the output having hacked our code to always preserve the global palette.

As you can see from the metrics the palette is the same, but we are failing to compress it as well as the input.

image

Our lookups appear to be matching also as the frames color appears to match exactly.

image
JimBobSquarePants commented 2 years ago

I rewrote the LzwEncoder based on https://github.com/deanm/omggif and was able to make a couple of very minor savings (62 bytes) due to the fact that we max out our subblock sizes at 254 instead of 255 bytes. I may actually switch it out because it's about half the code of our existing implementation.

However, that doesn't actually begin to touch the difference. I can only surmise that it must be getting encoded using some dark magic like Lossly LZW

I'm not sure we can actually do anything about the difference now.

JimBobSquarePants commented 2 years ago

I had another look at this using https://ezgif.com/ to experiment. The original is definitely using lossy encoding. It's the only way to make those saving.

JimBobSquarePants commented 1 year ago

We were chasing a red herring with the lossy encoding. When splitting the gif frames I was including the data from previous frames so couldn't see the obvious.

The gif in question is heavily optimized by the allpication of a de-duplication algorithm which strips out duplicate indexing detail from a frame by including only the changes between frames. This can lead to really good compression since you now have larger areas of equal values.

With the changes introduced in #2455 we go a very long way to improve the output size by introducing our own deduplicating algorithm and dropping 1.6MB off our encoded size.

Now... We don't end up with a result that exactly matches because we have to cater for images using millions of colors so use a memory (and accuracy) limited lookup table to match colors using a palette quantizer so mileage may vary, but we are capable of beating previously optimized efforts. The image in #2450 for example will be encoded with a saving of 0.37MB when compared to the original.

You can see a comparion of encoded frames in this image.

Left original, right ImageSharp.

image