SixLabors / ImageSharp

:camera: A modern, cross-platform, 2D Graphics library for .NET
https://sixlabors.com/products/imagesharp/
Other
7.35k stars 850 forks source link

Vectorize TrimTransparentPixels in GifEncoderCore #2500

Closed gfoidl closed 1 year ago

gfoidl commented 1 year ago

Prerequisites

Description

See https://github.com/SixLabors/ImageSharp/pull/2455#issuecomment-1613901985

A simple benchmark -- just for the inner loop -- yields:

|     Method |      Mean |    Error |   StdDev | Ratio |
|----------- |----------:|---------:|---------:|------:|
|    Default | 102.33 ns | 2.073 ns | 2.973 ns |  1.00 |
| Vectorized |  16.53 ns | 0.065 ns | 0.055 ns |  0.16 |

This is measured with .NET 7, but the codegen for .NET 6 is very similar.

benchmark code ```c# using System.Runtime.CompilerServices; using System.Runtime.InteropServices; using System.Runtime.Intrinsics; using BenchmarkDotNet.Attributes; Bench bench = new(); bench.Setup(); Console.WriteLine(bench.Default()); Console.WriteLine(bench.Vectorized()); #if !DEBUG BenchmarkDotNet.Running.BenchmarkRunner.Run(); #endif public class Bench { private byte[] _rowSpan = null!; private byte _trimmableIndex; [GlobalSetup] public void Setup() { _rowSpan = new byte[100]; _rowSpan.AsSpan().Fill(42); _rowSpan.AsSpan(25, 9).Clear(); _trimmableIndex = 42; } [Benchmark(Baseline = true)] public (int left, int right, bool isTransparentRow) Default() { Span rowSpan = _rowSpan; byte trimmableIndex = _trimmableIndex; int left = int.MaxValue; int right = int.MinValue; bool isTransparentRow = true; for (int x = 0; x < rowSpan.Length; ++x) { if (rowSpan[x] != trimmableIndex) { isTransparentRow = false; left = Math.Min(left, x); right = Math.Max(right, x); } } if (left == int.MaxValue) { left = 0; } if (right == int.MinValue) { right = rowSpan.Length; } return (left, right, isTransparentRow); } [Benchmark] public (int left, int right, bool isTransparentRow) Vectorized() { Span rowSpan = _rowSpan; byte trimmableIndex = _trimmableIndex; int left = int.MaxValue; int right = int.MinValue; bool isTransparentRow = true; ref byte rowPtr = ref MemoryMarshal.GetReference(rowSpan); nint rowLength = (nint)(uint)rowSpan.Length; nint x = 0; if (Vector128.IsHardwareAccelerated && rowLength >= Vector128.Count) { Vector256 trimmableVec256 = Vector256.Create(trimmableIndex); if (Vector256.IsHardwareAccelerated && rowLength >= Vector256.Count) { do { Vector256 vec = Vector256.LoadUnsafe(ref rowPtr, (nuint)x); Vector256 notEquals = ~Vector256.Equals(vec, trimmableVec256); if (notEquals != Vector256.Zero) { isTransparentRow = false; uint mask = notEquals.ExtractMostSignificantBits(); nint start = x + (nint)uint.TrailingZeroCount(mask); nint end = (nint)uint.LeadingZeroCount(mask); // end is from the end, but we need the index from the beginning end = x + Vector256.Count - 1 - end; left = Math.Min(left, (int)start); right = Math.Max(right, (int)end); } x += Vector256.Count; } while (x <= rowLength - Vector256.Count); } Vector128 trimmableVec = Vector256.IsHardwareAccelerated ? trimmableVec256.GetLower() : Vector128.Create(trimmableIndex); while (x <= rowLength - Vector128.Count) { Vector128 vec = Vector128.LoadUnsafe(ref rowPtr, (nuint)x); Vector128 notEquals = ~Vector128.Equals(vec, trimmableVec); if (notEquals != Vector128.Zero) { isTransparentRow = false; uint mask = notEquals.ExtractMostSignificantBits(); nint start = x + (nint)uint.TrailingZeroCount(mask); nint end = (nint)uint.LeadingZeroCount(mask) - Vector128.Count; // end is from the end, but we need the index from the beginning end = x + Vector128.Count - 1 - end; left = Math.Min(left, (int)start); right = Math.Max(right, (int)end); } x += Vector128.Count; } } for (; x < rowLength; ++x) { if (Unsafe.Add(ref rowPtr, x) != trimmableIndex) { isTransparentRow = false; left = Math.Min(left, (int)x); right = Math.Max(right, (int)x); } } if (left == int.MaxValue) { left = 0; } if (right == int.MinValue) { right = (int)rowLength; } return (left, right, isTransparentRow); } } ```
JimBobSquarePants commented 1 year ago

Oof! Look at those numbers!! Thanks so much for looking at this. I'm going have a good dig through it to wrap my head round what you have done.

gfoidl commented 1 year ago

😃 I think the easiest way to understand / check is to use the benchmark-code (see top-comment) in a simple console app and step with the debugger through it (maybe change the size of the _rowSpan to 20 or that like. Calculation of the correct index for end is the strangest part IMO.

PS: I'm back on Tuesday, so maybe slow to respond in the meantime.

JimBobSquarePants commented 1 year ago

This is fantastic stuff. I figured theoretically after reading some of the source for Span.IndexOf that masking with bit counting would be the vectorized solution I just had no idea how I'd actually implement it.

Tip of the cap to you sir.

gfoidl commented 1 year ago

Thanks for the kind words ❤️