Performance improvements

gao-artur commented 5 months ago

Hi. We are using this library to create Pdf reports with large tables. I did some benchmarks, and the memory allocations didn't look good, so I did some profiling to find where it could be improved. My benchmark includes creating a table with 15 columns and 2500 rows. Here are the initial results:

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4291/22H2/2022Update) 11th Gen Intel Core i7-11800H 2.30GHz, 1 CPU, 16 logical and 8 physical cores .NET SDK 8.0.204 [Host] : .NET 6.0.29 (6.0.2924.17105), X64 RyuJIT AVX2 DefaultJob : .NET 6.0.29 (6.0.2924.17105), X64 RyuJIT AVX2

Method	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
RenderPdfTable	1.129 s	0.0150 s	0.0141 s	44000.0000	16000.0000	1000.0000	589.81 MB

And here are the dotMemory allocation analysis results:

Then, I did a few simple optimizations and was able to reduce the allocations from 589.81 MB to 528.62 MB. Not huge, but it was just a POC to see how difficult it is to handle different cases.

Method	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
RenderPdfTable	1.111 s	0.0107 s	0.0100 s	43000.0000	14000.0000	1000.0000	528.62 MB

The changes I applied here:

Used a StringBuilder pool in a few places (PdfFlattenVisitor.VisitDocumentObjectCollection, XGraphicsPdfRenderer._content, PdfEncoders.FormatStringLiteral)

Used StringBuilder.GetChunks in a few places to avoid intermediate string allocation.

public byte[] GetBytes(StringBuilder stringBuilder)
{
#if NETCOREAPP3_0_OR_GREATER
var bytes = new byte[stringBuilder.Length];
var i = 0;
foreach (var chunk in stringBuilder.GetChunks())
{
    foreach (var ch in chunk.Span)
    {
        bytes[i++] = (byte)ch;
    }
}

return bytes;
#else
return GetBytes(stringBuilder.ToString());
#endif
}

Many other places can be optimized with different levels of effort. For example, you can reduce 3 more MB by passing StringComparer.InvariantCultureIgnoreCase into FontDescriptorCache constructor and removing name = name.ToLowerInvariant(); from the FontDescriptor.ComputeFdKey method. And even more by creating a struct key that will include isBold and isItalic booleans and avoid creating a new string at all.

Also, from what I have seen, most of the renderers are one-time use, so making them reusable will eliminate a lot of short-living object creation:

Let me know if you are interested in accepting PR's with these and other changes around memory allocations.

gao-artur commented 5 months ago

Note that the StringBuilder.GetChunks can be added to older frameworks with Polyfill if you are open to adopting this library. It can bring the newest and fastest API to older TFMs, and improve code maintainability (by removing all the #if NET6_0_OR_GREATER etc). The only drawback is increased assembly size because Polyfill adds all these APIs as source code into your code base.

gao-artur commented 5 months ago

@ThomasHoevel any feedback on this? I just wait for a green light from you to create PRs.

ThomasHoevel commented 4 months ago

Since you asked me...

Where to start?

I am not the decision maker.
From Polyfill: "Some polyfills are implemented in a way that will not have the equivalent performance to the actual implementations." Not very promising. But we might check Polyfill.
One of our table tests was around 115 s or 123 s respectively with version 1.32. With version 6.0 we measured 1.1 s or 1.3 seconds respectively. Saving another 10% would be a significant improvement, but there are known issues with table handling and IMHO we should not make optimizations that make code hard to read and hard to maintain before we resolve the fundamental issues. But I am not the decision maker.

GitHub is not our development repository and we never accept PRs here. But creating PRs here is one way to propose code changes.

gao-artur commented 4 months ago

I tagged you because I saw you are actively answering issues. Sorry if I tagged the wrong person. The proposed changes concern memory allocation, not execution time. But of course, reducing the amount of garbage produced will also improve time. I just don't want to invest time in changes that won't be accepted for any reason. Once I get a sign of interest in this optimization I'll be happy to create a PR.

empira / PDFsharp

Performance improvements #106