Scighost commented 1 year ago

Collapse will use the method in ColorThief to calculate the theme color of the background image when switching game region. But the performance of this third-party library is extremely bad. It takes 169 ms when parsing the official launcher image (1280x730), and only more when parsing custom larger images.

Get palette list with colorCount=255 and quality=3, Same parameters as collapse.

Ignore the time consumption and memory allocation for image decoding

ColorThief can be optimized as follows:

Use pointer to get pixels with color instead of Bitmap.GetPixel
Use one-dimensional arrays byte[] to store pixels instead of interleaved arrays byte[][]
Directly specify the number of pixels when splitting the vbox, instead of repeated calculations

After I optimized the code, the time consumed and the amount of memory allocated were greatly reduced. To make it simpler to write the code, each pixel of the image will be included, so the parameters: colorCount=255 and quality=1.

Could I remove the submodule ColorThief and commit code in this project?

muscularcandy67 commented 1 year ago

Maybe instead of removing and committing code to Collapse project you could directly do a Pull Request here (that is if neon-nyan is okay with it)

Scighost commented 1 year ago

Maybe instead of removing and committing code to Collapse project you could directly do a Pull Request here (that is if neon-nyan is okay with it)

Oh, I didn't see this.

neon-nyan commented 1 year ago

Hi @Scighost,

Thank you for the contribution and this is a really impressive improvement. While looking at the code, I noticed that the quality argument doesn't affect the palette result which in this case, the quality should've been used for stepping/reducing the color buffer to be generated.

Here I use an image with 3762x2000 pixels in resolution for the sample (Click here for the sample image).

On benchmark with 3 different values and each one of them runs on quality=1, quality=4 and quality=8. As we can see here that those 3 different values don't seem to have any differences.

Benchmark Result (Before Fix)

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Gen1	Gen2	Allocated	Alloc Ratio
GetPalleteOldQuality8	58.367 ms	0.8977 ms	1.7720 ms	1.01	0.04	571.4286	571.4286	571.4286	21.65 MB	1.00
GetPalleteOldQuality4	57.969 ms	0.4876 ms	0.4561 ms	0.99	0.01	625.0000	625.0000	625.0000	21.65 MB	1.00
GetPalleteOldQuality1	58.592 ms	0.5750 ms	0.5097 ms	1.00	0.00	625.0000	625.0000	625.0000	21.65 MB	1.00

In this case, I found that the quality argument in GetPixels() method is unused for stepping. For this, I added some changes into the method as follows:

Increment/stepping the index by quality
Reduce the initial pixelArray allocation based on quality value

Code

private unsafe ReadOnlySpan<byte> GetPixels(Bitmap sourceImage, int quality, bool ignoreWhite)
{
    int step = 0;
    int numUsedPixels = 0;

    var data = sourceImage.LockBits(new Rectangle(new Point(), sourceImage.Size), ImageLockMode.ReadOnly, sourceImage.PixelFormat);
    bool isHasAlpha = data.PixelFormat is PixelFormat.Format32bppArgb;
    int chanCount = isHasAlpha ? 4 : 3;

    var span = new Span<byte>((void*)data.Scan0, data.Stride * data.Height * chanCount);
    byte[] pixelArray = new byte[sourceImage.Width * sourceImage.Height * chanCount / quality];

    for (int i = 0; i < data.Height; i++)
    {
        for (int j = step; j < data.Width; j += quality)
        {
            byte b = span[data.Stride * i + chanCount * j + 0];
            byte g = span[data.Stride * i + chanCount * j + 1];
            byte r = span[data.Stride * i + chanCount * j + 2];
            if (!(ignoreWhite && r > 250 && g > 250 && b > 250))
            {
                pixelArray[numUsedPixels * 3 + 0] = r;
                pixelArray[numUsedPixels * 3 + 1] = g;
                pixelArray[numUsedPixels * 3 + 2] = b;
                numUsedPixels++;
            }

            if (data.Width - j < quality) step = quality - (data.Width - j);
        }
    }

    sourceImage.UnlockBits(data);
    return pixelArray.AsSpan().Slice(0, numUsedPixels);
}

After the changes above, here we can see that the "Mean" and "MemoryAllocation" reduced while using quality=4 and quality=8.

Benchmark Result (After Fix)

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Gen1	Gen2	Allocated	Alloc Ratio
GetPalleteOldQuality8	58.367 ms	0.8977 ms	1.7720 ms	1.01	0.04	571.4286	571.4286	571.4286	21.65 MB	1.00
GetPalleteOldQuality4	57.969 ms	0.4876 ms	0.4561 ms	0.99	0.01	625.0000	625.0000	625.0000	21.65 MB	1.00
GetPalleteOldQuality1	58.592 ms	0.5750 ms	0.5097 ms	1.00	0.00	625.0000	625.0000	625.0000	21.65 MB	1.00
GetPalleteNewAndFixQuality8	6.025 ms	0.0509 ms	0.0476 ms	0.10	0.00	500.0000	500.0000	500.0000	2.82 MB	0.13
GetPalleteNewAndFixQuality4	11.544 ms	0.0974 ms	0.0863 ms	0.20	0.00	953.1250	953.1250	953.1250	5.51 MB	0.25
GetPalleteNewAndFixQuality1	45.371 ms	0.7706 ms	0.7208 ms	0.77	0.01	750.0000	750.0000	750.0000	21.65 MB	1.00

And as you might notice that there are others changes being made, including:

Logic simplification
Explicitly defined the channel count if the PixelFormat has an Alpha Channel (4) or not (3)
Change Span<T> to ReadOnlySpan<T> for return type (not really much changed)

Speaking of the removal of the ColorThief and to move the code into Collapse repo, I think we should consider to keep it as is and instead create a new PR to the ColorThief repo (since most of the changes are at ColorThief code-base after all).

Update 1 - 2023/05/29 - 00:37 +7

I've made some changes into the code, including the addition of using pointer for the Span and pixelArray

Code

private unsafe ReadOnlySpan<byte> GetPixels(Bitmap sourceImage, int quality, bool ignoreWhite)
{
    int step = 0;
    int numUsedPixels = 0;

    var data = sourceImage.LockBits(new Rectangle(new Point(), sourceImage.Size), ImageLockMode.ReadOnly, sourceImage.PixelFormat);
    bool isHasAlpha = data.PixelFormat is PixelFormat.Format32bppArgb;
    int chanCount = isHasAlpha ? 4 : 3;

    var span = new Span<byte>((void*)data.Scan0, data.Stride * data.Height * chanCount);
    byte[] pixelArray = new byte[sourceImage.Width * sourceImage.Height * chanCount / quality];

    fixed (byte* spanPtr = span)
    fixed (byte* pixelPtr = pixelArray)
    {
        for (int i = 0; i < data.Height; i++)
        {
            for (int j = step; j < data.Width; j += quality)
            {
                byte* bP = spanPtr + (data.Stride * i + chanCount * j + 0);
                byte* gP = spanPtr + (data.Stride * i + chanCount * j + 1);
                byte* rP = spanPtr + (data.Stride * i + chanCount * j + 2);

                if (!(ignoreWhite && *rP > 250 && *gP > 250 && *bP > 250))
                {
                    pixelArray[numUsedPixels * 3 + 0] = *rP;
                    pixelArray[numUsedPixels * 3 + 1] = *gP;
                    pixelArray[numUsedPixels * 3 + 2] = *bP;
                    numUsedPixels++;
                }

                if (data.Width - j < quality) step = quality - (data.Width - j);
            }
        }
    }

    sourceImage.UnlockBits(data);
    return pixelArray.AsSpan().Slice(0, numUsedPixels);
}

Benchmark Result (As per update: 2023/05/29 - 00:37 +7)

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Gen1	Gen2	Allocated	Alloc Ratio
GetPalleteOldQuality8	58.592 ms	0.6526 ms	0.6104 ms	1.01	0.02	625.0000	625.0000	625.0000	21.65 MB	1.00
GetPalleteOldQuality4	57.431 ms	0.5185 ms	0.4850 ms	0.99	0.01	571.4286	571.4286	571.4286	21.65 MB	1.00
GetPalleteOldQuality1	58.049 ms	0.5307 ms	0.4964 ms	1.00	0.00	625.0000	625.0000	625.0000	21.65 MB	1.00
GetPalleteNewAndFixQuality8	5.103 ms	0.0343 ms	0.0320 ms	0.09	0.00	500.0000	500.0000	500.0000	2.82 MB	0.13
GetPalleteNewAndFixQuality4	9.391 ms	0.0735 ms	0.0651 ms	0.16	0.00	953.1250	953.1250	953.1250	5.51 MB	0.25
GetPalleteNewAndFixQuality1	35.386 ms	0.2457 ms	0.1919 ms	0.61	0.01	800.0000	800.0000	800.0000	21.65 MB	1.00

neon-nyan commented 1 year ago

Oh shoot, I forgot to use a pointer on pixelArray. Gonna fix that later 😅

Scighost commented 1 year ago

Good job!

There are two more small areas that can be optimized.

Initialize the color vector space directly from the image's memory area, skipping the step of passing pixel values using byte arrays.
After setting quality, only pixels in the x-direction are currently sampled, and pixels in the y-direction should be sampled as well.

Of course, it is very good now, and I will not pull request for ColorThief again.

neon-nyan commented 1 year ago

Hi @Scighost,

Thank you for the information. As per your suggestion and some additional changes including:

Enumerate the pixel (only took R, G and B channel) into a single int.
Pass the IEnumerable<int> into Mmcq.Quantize() and processing the pixels on-the-fly.

By enumerating the pixel directly instead of pre-allocating it first with this "v2" code, we got a really significant reduction memory allocation from initially 352 MB to 21.65 MB at "v1" changes and finally 130 KB at "v2" changes with quality=1

Method	Mean	Error	StdDev	Ratio	Allocated	Alloc Ratio
GetPalleteOldQuality8	458.494 ms	7.2605 ms	6.7915 ms	0.121	44205.29 KB	0.125
GetPalleteOldQuality4	918.912 ms	7.1911 ms	6.7266 ms	0.243	88262.94 KB	0.250
GetPalleteOldQuality2	1,870.017 ms	11.8865 ms	11.1187 ms	0.495	176398.48 KB	0.500
GetPalleteOldQuality1	3,781.903 ms	13.0441 ms	10.8924 ms	1.000	352658.34 KB	1.000
GetPalleteNewAndFixQuality8	1.593 ms	0.0163 ms	0.0153 ms	0.000	130.38 KB	0.000
GetPalleteNewAndFixQuality4	4.799 ms	0.0285 ms	0.0253 ms	0.001	130.38 KB	0.000
GetPalleteNewAndFixQuality2	17.884 ms	0.2137 ms	0.1999 ms	0.005	130.4 KB	0.000
GetPalleteNewAndFixQuality1	71.567 ms	0.8822 ms	0.8252 ms	0.019	130.45 KB	0.000

Note

The quality=1 on this new "v2" code has a bit performance degradation compared to the last modification due to enumeration overhead. But the rest with quality > 1 got a bit performance improvement. Hope this can be improved in the future.

That being said, thank you for your amazing contribution on this optimization! Please allow me to merge the changes into ColorThief Submodule repo and close this issue.

YattaBoom

Scighost commented 1 year ago

CollapseLauncher / Collapse

Optimize the lib ColorThief #152

Benchmark Result (Before Fix)

Code

Benchmark Result (After Fix)

Update 1 - 2023/05/29 - 00:37 +7

Code

Benchmark Result (As per update: 2023/05/29 - 00:37 +7)

Note