CollapseLauncher / Collapse

An Advanced Launcher for miHoYo/HoYoverse Games
MIT License
1.4k stars 64 forks source link

Optimize the lib ColorThief #152

Closed Scighost closed 1 year ago

Scighost commented 1 year ago

Collapse will use the method in ColorThief to calculate the theme color of the background image when switching game region. But the performance of this third-party library is extremely bad. It takes 169 ms when parsing the official launcher image (1280x730), and only more when parsing custom larger images.

Get palette list with colorCount=255 and quality=3, Same parameters as collapse.

Ignore the time consumption and memory allocation for image decoding

image

ColorThief can be optimized as follows:

After I optimized the code, the time consumed and the amount of memory allocated were greatly reduced. To make it simpler to write the code, each pixel of the image will be included, so the parameters: colorCount=255 and quality=1.

image image

Could I remove the submodule ColorThief and commit code in this project?

muscularcandy67 commented 1 year ago

Maybe instead of removing and committing code to Collapse project you could directly do a Pull Request here (that is if neon-nyan is okay with it)

Scighost commented 1 year ago

Maybe instead of removing and committing code to Collapse project you could directly do a Pull Request here (that is if neon-nyan is okay with it)

Oh, I didn't see this.

neon-nyan commented 1 year ago

Hi @Scighost,

Thank you for the contribution and this is a really impressive improvement. While looking at the code, I noticed that the quality argument doesn't affect the palette result which in this case, the quality should've been used for stepping/reducing the color buffer to be generated.

Here I use an image with 3762x2000 pixels in resolution for the sample (Click here for the sample image).

On benchmark with 3 different values and each one of them runs on quality=1, quality=4 and quality=8. As we can see here that those 3 different values don't seem to have any differences.

Benchmark Result (Before Fix)

Method Mean Error StdDev Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
GetPalleteOldQuality8 58.367 ms 0.8977 ms 1.7720 ms 1.01 0.04 571.4286 571.4286 571.4286 21.65 MB 1.00
GetPalleteOldQuality4 57.969 ms 0.4876 ms 0.4561 ms 0.99 0.01 625.0000 625.0000 625.0000 21.65 MB 1.00
GetPalleteOldQuality1 58.592 ms 0.5750 ms 0.5097 ms 1.00 0.00 625.0000 625.0000 625.0000 21.65 MB 1.00

In this case, I found that the quality argument in GetPixels() method is unused for stepping. For this, I added some changes into the method as follows:

Code

private unsafe ReadOnlySpan<byte> GetPixels(Bitmap sourceImage, int quality, bool ignoreWhite)
{
    int step = 0;
    int numUsedPixels = 0;

    var data = sourceImage.LockBits(new Rectangle(new Point(), sourceImage.Size), ImageLockMode.ReadOnly, sourceImage.PixelFormat);
    bool isHasAlpha = data.PixelFormat is PixelFormat.Format32bppArgb;
    int chanCount = isHasAlpha ? 4 : 3;

    var span = new Span<byte>((void*)data.Scan0, data.Stride * data.Height * chanCount);
    byte[] pixelArray = new byte[sourceImage.Width * sourceImage.Height * chanCount / quality];

    for (int i = 0; i < data.Height; i++)
    {
        for (int j = step; j < data.Width; j += quality)
        {
            byte b = span[data.Stride * i + chanCount * j + 0];
            byte g = span[data.Stride * i + chanCount * j + 1];
            byte r = span[data.Stride * i + chanCount * j + 2];
            if (!(ignoreWhite && r > 250 && g > 250 && b > 250))
            {
                pixelArray[numUsedPixels * 3 + 0] = r;
                pixelArray[numUsedPixels * 3 + 1] = g;
                pixelArray[numUsedPixels * 3 + 2] = b;
                numUsedPixels++;
            }

            if (data.Width - j < quality) step = quality - (data.Width - j);
        }
    }

    sourceImage.UnlockBits(data);
    return pixelArray.AsSpan().Slice(0, numUsedPixels);
}

After the changes above, here we can see that the "Mean" and "MemoryAllocation" reduced while using quality=4 and quality=8.

Benchmark Result (After Fix)

Method Mean Error StdDev Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
GetPalleteOldQuality8 58.367 ms 0.8977 ms 1.7720 ms 1.01 0.04 571.4286 571.4286 571.4286 21.65 MB 1.00
GetPalleteOldQuality4 57.969 ms 0.4876 ms 0.4561 ms 0.99 0.01 625.0000 625.0000 625.0000 21.65 MB 1.00
GetPalleteOldQuality1 58.592 ms 0.5750 ms 0.5097 ms 1.00 0.00 625.0000 625.0000 625.0000 21.65 MB 1.00
GetPalleteNewAndFixQuality8 6.025 ms 0.0509 ms 0.0476 ms 0.10 0.00 500.0000 500.0000 500.0000 2.82 MB 0.13
GetPalleteNewAndFixQuality4 11.544 ms 0.0974 ms 0.0863 ms 0.20 0.00 953.1250 953.1250 953.1250 5.51 MB 0.25
GetPalleteNewAndFixQuality1 45.371 ms 0.7706 ms 0.7208 ms 0.77 0.01 750.0000 750.0000 750.0000 21.65 MB 1.00

And as you might notice that there are others changes being made, including:

Speaking of the removal of the ColorThief and to move the code into Collapse repo, I think we should consider to keep it as is and instead create a new PR to the ColorThief repo (since most of the changes are at ColorThief code-base after all).

Update 1 - 2023/05/29 - 00:37 +7

I've made some changes into the code, including the addition of using pointer for the Span and pixelArray

Code

private unsafe ReadOnlySpan<byte> GetPixels(Bitmap sourceImage, int quality, bool ignoreWhite)
{
    int step = 0;
    int numUsedPixels = 0;

    var data = sourceImage.LockBits(new Rectangle(new Point(), sourceImage.Size), ImageLockMode.ReadOnly, sourceImage.PixelFormat);
    bool isHasAlpha = data.PixelFormat is PixelFormat.Format32bppArgb;
    int chanCount = isHasAlpha ? 4 : 3;

    var span = new Span<byte>((void*)data.Scan0, data.Stride * data.Height * chanCount);
    byte[] pixelArray = new byte[sourceImage.Width * sourceImage.Height * chanCount / quality];

    fixed (byte* spanPtr = span)
    fixed (byte* pixelPtr = pixelArray)
    {
        for (int i = 0; i < data.Height; i++)
        {
            for (int j = step; j < data.Width; j += quality)
            {
                byte* bP = spanPtr + (data.Stride * i + chanCount * j + 0);
                byte* gP = spanPtr + (data.Stride * i + chanCount * j + 1);
                byte* rP = spanPtr + (data.Stride * i + chanCount * j + 2);

                if (!(ignoreWhite && *rP > 250 && *gP > 250 && *bP > 250))
                {
                    pixelArray[numUsedPixels * 3 + 0] = *rP;
                    pixelArray[numUsedPixels * 3 + 1] = *gP;
                    pixelArray[numUsedPixels * 3 + 2] = *bP;
                    numUsedPixels++;
                }

                if (data.Width - j < quality) step = quality - (data.Width - j);
            }
        }
    }

    sourceImage.UnlockBits(data);
    return pixelArray.AsSpan().Slice(0, numUsedPixels);
}

Benchmark Result (As per update: 2023/05/29 - 00:37 +7)

Method Mean Error StdDev Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
GetPalleteOldQuality8 58.592 ms 0.6526 ms 0.6104 ms 1.01 0.02 625.0000 625.0000 625.0000 21.65 MB 1.00
GetPalleteOldQuality4 57.431 ms 0.5185 ms 0.4850 ms 0.99 0.01 571.4286 571.4286 571.4286 21.65 MB 1.00
GetPalleteOldQuality1 58.049 ms 0.5307 ms 0.4964 ms 1.00 0.00 625.0000 625.0000 625.0000 21.65 MB 1.00
GetPalleteNewAndFixQuality8 5.103 ms 0.0343 ms 0.0320 ms 0.09 0.00 500.0000 500.0000 500.0000 2.82 MB 0.13
GetPalleteNewAndFixQuality4 9.391 ms 0.0735 ms 0.0651 ms 0.16 0.00 953.1250 953.1250 953.1250 5.51 MB 0.25
GetPalleteNewAndFixQuality1 35.386 ms 0.2457 ms 0.1919 ms 0.61 0.01 800.0000 800.0000 800.0000 21.65 MB 1.00
neon-nyan commented 1 year ago

Oh shoot, I forgot to use a pointer on pixelArray. Gonna fix that later 😅

Scighost commented 1 year ago

Good job!

There are two more small areas that can be optimized.

Of course, it is very good now, and I will not pull request for ColorThief again.

neon-nyan commented 1 year ago

Hi @Scighost,

Thank you for the information. As per your suggestion and some additional changes including:

By enumerating the pixel directly instead of pre-allocating it first with this "v2" code, we got a really significant reduction memory allocation from initially 352 MB to 21.65 MB at "v1" changes and finally 130 KB at "v2" changes with quality=1

Method Mean Error StdDev Ratio Allocated Alloc Ratio
GetPalleteOldQuality8 458.494 ms 7.2605 ms 6.7915 ms 0.121 44205.29 KB 0.125
GetPalleteOldQuality4 918.912 ms 7.1911 ms 6.7266 ms 0.243 88262.94 KB 0.250
GetPalleteOldQuality2 1,870.017 ms 11.8865 ms 11.1187 ms 0.495 176398.48 KB 0.500
GetPalleteOldQuality1 3,781.903 ms 13.0441 ms 10.8924 ms 1.000 352658.34 KB 1.000
GetPalleteNewAndFixQuality8 1.593 ms 0.0163 ms 0.0153 ms 0.000 130.38 KB 0.000
GetPalleteNewAndFixQuality4 4.799 ms 0.0285 ms 0.0253 ms 0.001 130.38 KB 0.000
GetPalleteNewAndFixQuality2 17.884 ms 0.2137 ms 0.1999 ms 0.005 130.4 KB 0.000
GetPalleteNewAndFixQuality1 71.567 ms 0.8822 ms 0.8252 ms 0.019 130.45 KB 0.000

Note

The quality=1 on this new "v2" code has a bit performance degradation compared to the last modification due to enumeration overhead. But the rest with quality > 1 got a bit performance improvement. Hope this can be improved in the future.

That being said, thank you for your amazing contribution on this optimization! Please allow me to merge the changes into ColorThief Submodule repo and close this issue.

YattaBoom

Scighost commented 1 year ago

image