Closed Scighost closed 1 year ago
Maybe instead of removing and committing code to Collapse project you could directly do a Pull Request here (that is if neon-nyan is okay with it)
Maybe instead of removing and committing code to Collapse project you could directly do a Pull Request here (that is if neon-nyan is okay with it)
Oh, I didn't see this.
Hi @Scighost,
Thank you for the contribution and this is a really impressive improvement.
While looking at the code, I noticed that the quality
argument doesn't affect the palette result which in this case, the quality
should've been used for stepping/reducing the color buffer to be generated.
Here I use an image with 3762x2000 pixels in resolution for the sample (Click here for the sample image).
On benchmark with 3 different values and each one of them runs on quality=1
, quality=4
and quality=8
. As we can see here that those 3 different values don't seem to have any differences.
Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|
GetPalleteOldQuality8 | 58.367 ms | 0.8977 ms | 1.7720 ms | 1.01 | 0.04 | 571.4286 | 571.4286 | 571.4286 | 21.65 MB | 1.00 |
GetPalleteOldQuality4 | 57.969 ms | 0.4876 ms | 0.4561 ms | 0.99 | 0.01 | 625.0000 | 625.0000 | 625.0000 | 21.65 MB | 1.00 |
GetPalleteOldQuality1 | 58.592 ms | 0.5750 ms | 0.5097 ms | 1.00 | 0.00 | 625.0000 | 625.0000 | 625.0000 | 21.65 MB | 1.00 |
In this case, I found that the quality
argument in GetPixels()
method is unused for stepping.
For this, I added some changes into the method as follows:
quality
pixelArray
allocation based on quality
valueprivate unsafe ReadOnlySpan<byte> GetPixels(Bitmap sourceImage, int quality, bool ignoreWhite)
{
int step = 0;
int numUsedPixels = 0;
var data = sourceImage.LockBits(new Rectangle(new Point(), sourceImage.Size), ImageLockMode.ReadOnly, sourceImage.PixelFormat);
bool isHasAlpha = data.PixelFormat is PixelFormat.Format32bppArgb;
int chanCount = isHasAlpha ? 4 : 3;
var span = new Span<byte>((void*)data.Scan0, data.Stride * data.Height * chanCount);
byte[] pixelArray = new byte[sourceImage.Width * sourceImage.Height * chanCount / quality];
for (int i = 0; i < data.Height; i++)
{
for (int j = step; j < data.Width; j += quality)
{
byte b = span[data.Stride * i + chanCount * j + 0];
byte g = span[data.Stride * i + chanCount * j + 1];
byte r = span[data.Stride * i + chanCount * j + 2];
if (!(ignoreWhite && r > 250 && g > 250 && b > 250))
{
pixelArray[numUsedPixels * 3 + 0] = r;
pixelArray[numUsedPixels * 3 + 1] = g;
pixelArray[numUsedPixels * 3 + 2] = b;
numUsedPixels++;
}
if (data.Width - j < quality) step = quality - (data.Width - j);
}
}
sourceImage.UnlockBits(data);
return pixelArray.AsSpan().Slice(0, numUsedPixels);
}
After the changes above, here we can see that the "Mean" and "MemoryAllocation" reduced while using quality=4
and quality=8
.
Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|
GetPalleteOldQuality8 | 58.367 ms | 0.8977 ms | 1.7720 ms | 1.01 | 0.04 | 571.4286 | 571.4286 | 571.4286 | 21.65 MB | 1.00 |
GetPalleteOldQuality4 | 57.969 ms | 0.4876 ms | 0.4561 ms | 0.99 | 0.01 | 625.0000 | 625.0000 | 625.0000 | 21.65 MB | 1.00 |
GetPalleteOldQuality1 | 58.592 ms | 0.5750 ms | 0.5097 ms | 1.00 | 0.00 | 625.0000 | 625.0000 | 625.0000 | 21.65 MB | 1.00 |
GetPalleteNewAndFixQuality8 | 6.025 ms | 0.0509 ms | 0.0476 ms | 0.10 | 0.00 | 500.0000 | 500.0000 | 500.0000 | 2.82 MB | 0.13 |
GetPalleteNewAndFixQuality4 | 11.544 ms | 0.0974 ms | 0.0863 ms | 0.20 | 0.00 | 953.1250 | 953.1250 | 953.1250 | 5.51 MB | 0.25 |
GetPalleteNewAndFixQuality1 | 45.371 ms | 0.7706 ms | 0.7208 ms | 0.77 | 0.01 | 750.0000 | 750.0000 | 750.0000 | 21.65 MB | 1.00 |
And as you might notice that there are others changes being made, including:
PixelFormat
has an Alpha Channel (4) or not (3)Span<T>
to ReadOnlySpan<T>
for return type (not really much changed)Speaking of the removal of the ColorThief
and to move the code into Collapse
repo, I think we should consider to keep it as is and instead create a new PR to the ColorThief
repo (since most of the changes are at ColorThief
code-base after all).
I've made some changes into the code, including the addition of using pointer for the Span
and pixelArray
private unsafe ReadOnlySpan<byte> GetPixels(Bitmap sourceImage, int quality, bool ignoreWhite)
{
int step = 0;
int numUsedPixels = 0;
var data = sourceImage.LockBits(new Rectangle(new Point(), sourceImage.Size), ImageLockMode.ReadOnly, sourceImage.PixelFormat);
bool isHasAlpha = data.PixelFormat is PixelFormat.Format32bppArgb;
int chanCount = isHasAlpha ? 4 : 3;
var span = new Span<byte>((void*)data.Scan0, data.Stride * data.Height * chanCount);
byte[] pixelArray = new byte[sourceImage.Width * sourceImage.Height * chanCount / quality];
fixed (byte* spanPtr = span)
fixed (byte* pixelPtr = pixelArray)
{
for (int i = 0; i < data.Height; i++)
{
for (int j = step; j < data.Width; j += quality)
{
byte* bP = spanPtr + (data.Stride * i + chanCount * j + 0);
byte* gP = spanPtr + (data.Stride * i + chanCount * j + 1);
byte* rP = spanPtr + (data.Stride * i + chanCount * j + 2);
if (!(ignoreWhite && *rP > 250 && *gP > 250 && *bP > 250))
{
pixelArray[numUsedPixels * 3 + 0] = *rP;
pixelArray[numUsedPixels * 3 + 1] = *gP;
pixelArray[numUsedPixels * 3 + 2] = *bP;
numUsedPixels++;
}
if (data.Width - j < quality) step = quality - (data.Width - j);
}
}
}
sourceImage.UnlockBits(data);
return pixelArray.AsSpan().Slice(0, numUsedPixels);
}
Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|
GetPalleteOldQuality8 | 58.592 ms | 0.6526 ms | 0.6104 ms | 1.01 | 0.02 | 625.0000 | 625.0000 | 625.0000 | 21.65 MB | 1.00 |
GetPalleteOldQuality4 | 57.431 ms | 0.5185 ms | 0.4850 ms | 0.99 | 0.01 | 571.4286 | 571.4286 | 571.4286 | 21.65 MB | 1.00 |
GetPalleteOldQuality1 | 58.049 ms | 0.5307 ms | 0.4964 ms | 1.00 | 0.00 | 625.0000 | 625.0000 | 625.0000 | 21.65 MB | 1.00 |
GetPalleteNewAndFixQuality8 | 5.103 ms | 0.0343 ms | 0.0320 ms | 0.09 | 0.00 | 500.0000 | 500.0000 | 500.0000 | 2.82 MB | 0.13 |
GetPalleteNewAndFixQuality4 | 9.391 ms | 0.0735 ms | 0.0651 ms | 0.16 | 0.00 | 953.1250 | 953.1250 | 953.1250 | 5.51 MB | 0.25 |
GetPalleteNewAndFixQuality1 | 35.386 ms | 0.2457 ms | 0.1919 ms | 0.61 | 0.01 | 800.0000 | 800.0000 | 800.0000 | 21.65 MB | 1.00 |
Oh shoot, I forgot to use a pointer on pixelArray
. Gonna fix that later 😅
Good job!
There are two more small areas that can be optimized.
Of course, it is very good now, and I will not pull request for ColorThief again.
Hi @Scighost,
Thank you for the information. As per your suggestion and some additional changes including:
int
.IEnumerable<int>
into Mmcq.Quantize()
and processing the pixels on-the-fly.By enumerating the pixel directly instead of pre-allocating it first with this "v2" code, we got a really significant reduction memory allocation from initially 352 MB
to 21.65 MB
at "v1" changes and finally 130 KB
at "v2" changes with quality=1
Method | Mean | Error | StdDev | Ratio | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|
GetPalleteOldQuality8 | 458.494 ms | 7.2605 ms | 6.7915 ms | 0.121 | 44205.29 KB | 0.125 |
GetPalleteOldQuality4 | 918.912 ms | 7.1911 ms | 6.7266 ms | 0.243 | 88262.94 KB | 0.250 |
GetPalleteOldQuality2 | 1,870.017 ms | 11.8865 ms | 11.1187 ms | 0.495 | 176398.48 KB | 0.500 |
GetPalleteOldQuality1 | 3,781.903 ms | 13.0441 ms | 10.8924 ms | 1.000 | 352658.34 KB | 1.000 |
GetPalleteNewAndFixQuality8 | 1.593 ms | 0.0163 ms | 0.0153 ms | 0.000 | 130.38 KB | 0.000 |
GetPalleteNewAndFixQuality4 | 4.799 ms | 0.0285 ms | 0.0253 ms | 0.001 | 130.38 KB | 0.000 |
GetPalleteNewAndFixQuality2 | 17.884 ms | 0.2137 ms | 0.1999 ms | 0.005 | 130.4 KB | 0.000 |
GetPalleteNewAndFixQuality1 | 71.567 ms | 0.8822 ms | 0.8252 ms | 0.019 | 130.45 KB | 0.000 |
The quality=1
on this new "v2" code has a bit performance degradation compared to the last modification due to enumeration overhead. But the rest with quality > 1
got a bit performance improvement. Hope this can be improved in the future.
That being said, thank you for your amazing contribution on this optimization!
Please allow me to merge the changes into ColorThief
Submodule repo and close this issue.
Collapse will use the method in ColorThief to calculate the theme color of the background image when switching game region. But the performance of this third-party library is extremely bad. It takes 169 ms when parsing the official launcher image (1280x730), and only more when parsing custom larger images.
Get palette list with
colorCount=255
andquality=3
, Same parameters as collapse.ColorThief can be optimized as follows:
Bitmap.GetPixel
byte[]
to store pixels instead of interleaved arraysbyte[][]
After I optimized the code, the time consumed and the amount of memory allocated were greatly reduced. To make it simpler to write the code, each pixel of the image will be included, so the parameters:
colorCount=255
andquality=1
.Could I remove the submodule ColorThief and commit code in this project?