Using vectorised API's to do basic drawing instead of calling out to GDI+.

Zintom commented 6 months ago

Background and motivation

All calls to draw any kind of graphics, even just clearing a Bitmap to a certain colour, will result in a GDI/GDI+ invocation.

Anecdotal testing shows that using vectorised API's like AVX/SSE to directly modify the underlying Bitmap data displayed significantly improved performance. I imagine this is due to the GDI libraries being old and not being vectorised themselves.

For example; to clear a Bitmap, instead of calling Graphics.Clear(color), you can locate the Bitmap in memory using LockBits, create a stack allocated integer containing color, and use Span.Fill(color), this won't call out to GDI+, and Span uses vectorised instructions to accelerate the clearing of the pixels.

Code example:

    /// <summary>
    ///  Fills the entire drawing surface with the specified color.
    /// </summary>
    public unsafe void Clear(Color color)
    {
        // Store the height so that we don't make two calls to Bitmap.Height, saves 1 P/Invoke.
        int height = bitmap.Height;

        var bmpData = bitmap.LockBits(new Rectangle(0, 0, bitmap.Width, height), ImageLockMode.WriteOnly, imageBitmap.PixelFormat);

        // This is the length of the underlying memory which backs this bitmap.
        int lengthInBytes = bmpData.Stride * height;

        // Get our color as an integer - There might be a quicker way to do this, although this is already fast using the stack and an 'unsafe' type conversion.
        byte* colorData = stackalloc byte[4] { color.B, color.G, color.R, color.A };
        int colorInt = Unsafe.As<byte, int>(ref *colorData);

        // Get the backing memory of the Bitmap as a Span<int>.
        Span<int> imgBytes = new Span<int>((void*)bmpData.Scan0, lengthInBytes / sizeof(int));

        // Use the accelerated Span.Fill function to clear the memory with the requested color.
        imgBytes.Fill(colorInt);

        // Release the memory.
        bitmap.UnlockBits(bmpData);
    }

This is just one example of the basic Clear functionality, I have also experimented with drawing another bitmap directly into the memory of the source bitmap using AVX/SSE, this drastically improves draw performance (increasing as the number of draw calls increases), you can also choose to draw with transparency on each draw call, rather than dictating transparency support in the PixelFormat.

If this kind of proposal has support I am more than willing to provide PRs to add this to Bitmap.cs

API Proposal

public sealed unsafe class Bitmap : Image, IPointer<GpBitmap>
{
    public unsafe void Clear(Color color);
    public unsafe void Draw(Bitmap otherBitmap, Rectangle bounds, bool blendPixels);
}

API Usage

Bitmap bitmap = new Bitmap(1000, 1000);
bitmap.Clear(Color.Red);

Alternative Designs

No response

Risks

No response

Will this feature affect UI controls?

No

Zintom commented 6 months ago

The API surface is up for debate, I have just used that as an example.

Zintom commented 6 months ago

Below is a benchmark which draws an image to a Bitmap, comparing my C# impl and the GDI+ impl, as you can see, performance scales throughout image size and iterations, the C# impl being around 16 times faster; the Graphics object is initialized in the class constructor so all overhead was avoided, this is pure Draw Call vs Draw Call.

paul1956 commented 6 months ago

Doesn't this say more about implementation of GDI+ needing improvement? Or are you recommending the WinForms replace its drawing function or something else.

elachlan commented 6 months ago

@JeremyKuhne has been doing a lot of changes to System.Drawing and should be able to let you know if this is likely to get accepted.

Zintom commented 6 months ago

Doesn't this say more about implementation of GDI+ needing improvement? Or are you recommending the WinForms replace its drawing function or something else.

Where GDI+ is shipped with windows and is a "legacy" product, I imagine it does indicate a poor implementation, but also indicates we wouldn't get a fix from the Windows team.

It would be nice though, for the Windows team to refactor and optimize the GDI+ library as there still lots of parts of Windows that use it to display menus.

Zintom commented 6 months ago

Doesn't this say more about implementation of GDI+ needing improvement? Or are you recommending the WinForms replace its drawing function or something else.

Also, I think this demonstrates that .NET Core is capable of holding its own when it comes to CPU bound drawing tasks, the need for GDI+ diminishes. I imagine a time when all controls use Runtime level rendering and are drawn to a single Direct2D "canvas".

KlausLoeffelmann commented 6 months ago

Certainly, something to take a look, although it is much more likely that we're ending up utilizing something we have started to look into for A11Y reasons: Direct2D/DirectWrite. @JeremyKuhne needs to chime in here but expect a couple of weeks delay.

Zintom commented 6 months ago

It'll be nice to see WinForms move to Direct2D in the future, I built my own lib a while ago which mimicked WinForms and was based on OpenGL, performance was very good.

In terms of my proposal, I'm still going to tinker with using my approach for fun, I might even put it into a public lib.

elachlan commented 6 months ago

Related: #10740

kirsan31 commented 6 months ago

I am a bit worrying about Direct2D/DirectWrite and other similar gdi optimization in terms of using winforms apps over rdp 🤔 Rdp have it's own optimisation with gdi. And this is the main reason that we're stick to winforms still. Performance over rdp is much much critical/sensitive then local.

Zintom commented 6 months ago

I am a bit worrying about Direct2D/DirectWrite and other similar gdi optimization in terms of using winforms apps over rdp 🤔 Rdp have it's own optimisation with gdi. And this is the main reason that we're stick to winforms still. Performance over rdp is much much critical/sensitive then local.

My proposal here does not deviate from GDI+, it just works directly on the Bitmap instead calling out to GDI+ to do the work.

JeremyKuhne commented 6 months ago

I am a bit worrying about Direct2D/DirectWrite and other similar gdi optimization in terms of using winforms apps over rdp 🤔 Rdp have it's own optimisation with gdi. And this is the main reason that we're stick to winforms still. Performance over rdp is much much critical/sensitive then local.

@kirsan31 interesting feedback. I'm not sure how we can track performance here. We'll definitely keep it in mind and keep a sharp eye on feedback on performance regressing in this scenario.

kirsan31 commented 6 months ago

@JeremyKuhne

I'm not sure how we can track performance here.

Measure the amount of data transmitted over the network during a RDP session with a test application before and after optimization... 👀 I think I can participate in testing if needed...

JeremyKuhne commented 6 months ago

@Zintom I'm happy to take vectorization changes for performance here. As called out already, there is almost no chance of GDI+ changes happening so we can entertain and take these sorts of functionality improvements on our end.

Note that locking the bitmap appears to always makes a full copy of the data (I'm reasonably confident, but I could be misreading the code). Keep that in mind when measuring perf.

Things that make it easier to take changes:

Code comparing the way to accomplish the same thing before and after.
Comparable APIs in other MS graphics APIs (such as WPF, for naming if we're not modifying an existing API).
Any backing docs showing the need for the API (such as StackOverflow data).
Perf numbers.
Comprehensive unit tests ensuring we get identical results to using existing code and validating "negative" tests (like trying to set unavailable colors in an indexed bitmap).

When changing existing APIs if we can't be confident of getting identical results we need to and can address this in a number of ways. Possibilities are overloads with option flags, AppContext switches, docs, etc.

As far as Direct2D related functionality I'm currently inclined to make a new set of APIs in System.Drawing for that with conversion methods. Functionality is just too different. For example, something like System.Drawing.WindowsImaging for WIC.

Please tag me directly with anything in this space to draw my attention. :)

kirsan31 commented 6 months ago

@Zintom

it just works directly on the Bitmap instead calling out to GDI+ to do the work.

Yes and I am afraid that this will be performance decrease over rdp. RDP can transfer some GDI instructions/primitives instead of pictures. And in your case we will defiantly transfer a picture itself.

JeremyKuhne commented 6 months ago

@kirsan31 are you aware of any GDI+ calls that are remoted via RDP? I'll try to keep an eye out for that sort of functionality when I'm looking at the code.

elachlan commented 6 months ago

Could we just gate the functionality around SystemInformation.TerminalServerSession if it does cause a perf regression for RDP?

JeremyKuhne commented 6 months ago

Could we just gate the functionality around SystemInformation.TerminalServerSession if it does cause a perf regression for RDP?

Yep. Another option to consider.

dotnet / winforms