Eliminate the use of Linq

corliss commented 7 years ago

These are all the uses of Linq I could find in SharpFont:

TrueType\Header.cs: return rec.Created.Select(x => (int)x).ToArray(); TrueType\Header.cs: return rec.Modified.Select(x => (int)x).ToArray(); FaceInfo.cs: return rec.xuid.Select(x => (uint)x).ToArray(); Library.cs: ParameterRec[] paramRecs = parameters.Select(x => x.Record).ToArray(); Fnt\Header.cs: return rec.reserved1.Select(x => (uint) x).ToArray();

These are trivial uses of Linq, that do the same thing: cast an array to a different type. It is easier, cleaner, and much more efficient, to do this without Linq.

Linq adds a lot of overhead. This is one of the main learnings of the C# community over the last many years, and even Microsoft ran into this when creating Roslyn - see https://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/DEV-B333. The relevant portion of the talk is around 36:30, but I recommend watching the entire talk - it can change everything about programming in .net.

In addition linq is hard to debug. Try walking the code above for each element. Compare that to debugging a simple for or foreach loop.

Note: I'm providing this as data and information, but don't wish to be drawn into an argument, don't have time for it. If you really believe that these lines are important enough to warrant a separate project with LinqBridge, as mentioned in #104, that's your choice. But a decision like this is about the level of programming competence.

tl;dr

Replacing this Linq code with plain C# makes it simpler, faster, with less dependencies.

Pop quiz

return rec.Select(x => (int)x).ToArray();

The plain C# implementation causes just one allocation - the output array.

var result = new int[rec.Length];
for (var i = 0; i < rec.Length; i++)
    result.Add((int)rec[i]);

How many allocations occur with the Linq implementation, on an input of, e.g., 7 items? (Hint - the CLR and libraries are open source.) It is really important to know this, to appreciate why Linq is such an issue.

Edit

Turns out to be even worse than I thought. A call to SharpFont.TrueType.Header.Created results in:

public int[] Created { get {return rec.Created.Select(x => (int)x).ToArray(); } }

which calls SharpFont.TrueType.Internal.HeaderRec.Created:

internal FT_Long[] Created { get { return new[] {created1, created2}; } }

which creates a 2-element FT_Long array on each invocation, and then the calling property uses Linq to convert that into an int array!!! This happens on every invocation.

These arrays are not even needed. Just return a struct with the two values from the inner property and you have zero allocations and zero method calls. Here's the same thing rewritten using tuples (you can use any struct if you don't want to use tuples.)

public (int, int) Created { get { return rec.Created; } }
internal (int, int) Created { get { return ((int)created1, (int)created2); } }

Contrast this to the present situation where you have a bewildering number of allocations and method calls just to return Created.

I don't mean to sound harsh - this code was probably put together in haste without a good review. But it's worth understanding just how easy it is to do the wrong thing in C# and to compound code smells with band-aids that only seem attractive, but just mask the real problem.

Edit 2

Looks like there isn't any need to wrap the struct SharpFont.TrueType.Internal.HeaderRec with a class SharpFont.TrueType.Header. Just expose the struct, and save yet another allocation, and a whole load of code duplication by eliminating the class entirely. This also reduces the above to just

public (int, int) Created { get { return ((int)created1, (int)created2); } }

HinTak commented 7 years ago

Also see https://github.com/Robmaister/SharpFont/issues/95

Robmaister commented 7 years ago

Pop quiz:

Allocations:

Func allocation for the lambda function
new SelectArrayIterator - https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Select.cs#L39
new TResult[] - https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Select.cs#L248

Anything I'm missing? I watched the relevant section of the video, but that LINQ call is a much larger beast than ours. There's no captured variable in the lambda, so no helper class must be created, System.Array is not a value type (so no allocation from boxing), and no Enumerator instances are created because the underlying code is a for loop. As far as I know, the as operator on reference types causes no heap allocations.

I will watch the rest of the video, as I love watching these kinds of videos, but I'm also aware of performance issues in general. The code I wrote for this project is largely ~5 years old. I would definitely consider myself a better programmer now than I was then. Of course there will be some bad code in there that needs fixing.

I am generally against LINQ because (in this case, AFAIK) the performance hit of using LINQ is really not that bad and it's not used anywhere in common "hot paths" for font rendering, just auxiliary things that are most likely going to be called once or twice during initalization.

The main reason that I'd prefer to keep LINQ in is that LINQ is available in 3.5, and 3.5 is the earliest version of .NET that isn't completely end-of-life (no security updates, even if a massive flaw is found). Supporting a 15 year old unsupported runtime with minimal usage nowadays seems to me like it should be an edge case rather than a promise.

Edit 1:

Yeah, that new[] is bad, get rid of it. git blame shows this came from #80. This was a really large change that brought on a lot of good stuff and I didn't really review it that closely.

If we are going to drop LINQ support, I would prefer to not use tuples either and stick to only .NET 2.0 language features with a .NET 2.0 build and a PCL build. The actual logical return type for Header.Created and Header.Modified is a DateTime, so I'd prefer to write up the logic to do that - https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6head.html

Edit 2:

All of the TrueType table structs are good candidates for conversion to structs similar to FTMatrix (private fields for data storage, public properties for logical fields)

Robmaister commented 7 years ago

Ah that makes sense
I guess language features was the wrong term. I'd like to support .NET 2.0 without any external dependencies, as it was before #80.
Yes, these values are defined according to the spec https://www.microsoft.com/typography/otspec/head.htm

Number of seconds since 12:00 midnight that started January 1st 1904 in GMT/UTC time zone. 64-bit integer

This should be something different, perhaps not a Guid since the data doesn't seem to match that well. https://ghostscript.com/doc/9.19/Fonts.htm#Unique_IDs

The XUID is a Level 2 PostScript feature that serves the same function as the UniqueID, but is not limited to a single 24-bit integer. The bdftops program creates XUIDs of the form "[-X- 0 -U-]" where "-X-" is the organization XUID and "-U-" is the UniqueID. (Aladdin Enterprises' organization XUID, which appears in a few places in various font-related files distributed with Ghostscript, is 107; do not use this for your own fonts that you distribute.)

I built the API largely looking at FreeType docs, which gets a bit sparse when you get to things like TrueType and PostScript headers, so I pretty much stuck with the native types.

That's a very different use-case than most people who use SharpFont currently (from what I've seen, games and game engines mostly), where fonts are typically cached ahead of time in texture atlases. Working towards higher performance demands is a goal. At the same time, will you be pulling the timestamps of millions of TTF fonts at the same time? Or will you be rendering millions of glyphs per second?
Yes, both of those goals make sense - right now we're clocking in at roughly double the memory usage of FreeType called from C/C++. See #47. To summarize, FreeType maintains an internal list of pointers and expects the exact memory address back. Duplicating a face and sending it back that pointer results in an error. The solution I came up with was to store both the pointer and a managed instance of the equivalent struct (that would only get updated when a function call mutates the native copy).
Yeah, things are a little weird for me right now - , hence the flakiness. I've merged in your PR, done some manual MSBuild work to allow for a new ReleaseNET20 configuration alongside PCL (mostly working), and I'm getting rid of the existing LINQ code at the moment. I moved the ToGdipBitmap function out into a new SharpFont.GDI library as well.

Robmaister commented 7 years ago

Microsoft discontinued support for all frameworks besides .NET 3.5 and .NET 4.5.2.

Robmaister commented 7 years ago

Of the versions of .NET that are currently supported, LINQ is available in all of them. I guess it's less about LINQ itself and more that I'd prefer the minimum targeted version to be 3.5 instead of 2.0. LINQ just happened to be the only we're using that won't work on 2.0. Looking at it now, there's very little to gain from that bump up, and we're not gaining anything from LINQ, so I changed my mind.

I guess an analogy might help - if you were doing a major overhaul on a piece of software today that runs on Windows 98, would you keep supporting 98 or move the minimum requirements up to at least XP?

And yeah, no worries - I have generally been neglecting my open source work for the past year or so and I'm aware of that. I have a full time job lined up to start eventually (hoping within the next month? HR is painfully slow) and my freelance work will be scaled back a lot, the work I end up doing won't be super-top-priority like it is now

corliss commented 7 years ago

Cheers. Would you mind merging the repos at the same time as your current changes? That will create a permanent landing place for Harfbuzz pull requests going forward.

Robmaister commented 7 years ago

yeah, I'll commit what I have so far on the .NET 2.0 build and merge in HarfBuzz.

Robmaister commented 7 years ago

HarfBuzz merged in

corliss commented 7 years ago

Super, but think you missed the nuspec files. Also I think the current convention is to have the nuspec files live alongside the .csproj files rather than NuGet/Build - this allows properties from the .csproj to be referenced from the .nuspec.

mqudsi commented 6 years ago

@corliss do you know of any issues tracking the linq performance overhead in roslyn?

(Our https://github.com/neosmart/linqplus attempts to boost linq performance by specializing extension methods by data structure to provide a more efficient option than a naive IEnumerable<T> would allow for, but it's like trying to drain an ocean with a pipet.)

wstaelens commented 2 years ago

great!

Robmaister / SharpFont