Open corliss opened 7 years ago
Pop quiz:
Allocations:
Func
new SelectArrayIterator
- https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Select.cs#L39
new TResult[]
- https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Select.cs#L248
Anything I'm missing? I watched the relevant section of the video, but that LINQ call is a much larger beast than ours. There's no captured variable in the lambda, so no helper class must be created, System.Array is not a value type (so no allocation from boxing), and no Enumerator
instances are created because the underlying code is a for loop. As far as I know, the as
operator on reference types causes no heap allocations.
I will watch the rest of the video, as I love watching these kinds of videos, but I'm also aware of performance issues in general. The code I wrote for this project is largely ~5 years old. I would definitely consider myself a better programmer now than I was then. Of course there will be some bad code in there that needs fixing.
I am generally against LINQ because (in this case, AFAIK) the performance hit of using LINQ is really not that bad and it's not used anywhere in common "hot paths" for font rendering, just auxiliary things that are most likely going to be called once or twice during initalization.
The main reason that I'd prefer to keep LINQ in is that LINQ is available in 3.5, and 3.5 is the earliest version of .NET that isn't completely end-of-life (no security updates, even if a massive flaw is found). Supporting a 15 year old unsupported runtime with minimal usage nowadays seems to me like it should be an edge case rather than a promise.
Edit 1:
Yeah, that new[]
is bad, get rid of it. git blame shows this came from #80. This was a really large change that brought on a lot of good stuff and I didn't really review it that closely.
If we are going to drop LINQ support, I would prefer to not use tuples either and stick to only .NET 2.0 language features with a .NET 2.0 build and a PCL build. The actual logical return type for Header.Created
and Header.Modified
is a DateTime
, so I'd prefer to write up the logic to do that - https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6head.html
Edit 2:
All of the TrueType table structs are good candidates for conversion to structs similar to FTMatrix
(private fields for data storage, public properties for logical fields)
Number of seconds since 12:00 midnight that started January 1st 1904 in GMT/UTC time zone. 64-bit integer
Guid
since the data doesn't seem to match that well. https://ghostscript.com/doc/9.19/Fonts.htm#Unique_IDsThe XUID is a Level 2 PostScript feature that serves the same function as the UniqueID, but is not limited to a single 24-bit integer. The bdftops program creates XUIDs of the form "[-X- 0 -U-]" where "-X-" is the organization XUID and "-U-" is the UniqueID. (Aladdin Enterprises' organization XUID, which appears in a few places in various font-related files distributed with Ghostscript, is 107; do not use this for your own fonts that you distribute.)
I built the API largely looking at FreeType docs, which gets a bit sparse when you get to things like TrueType and PostScript headers, so I pretty much stuck with the native types.
That's a very different use-case than most people who use SharpFont currently (from what I've seen, games and game engines mostly), where fonts are typically cached ahead of time in texture atlases. Working towards higher performance demands is a goal. At the same time, will you be pulling the timestamps of millions of TTF fonts at the same time? Or will you be rendering millions of glyphs per second?
Yes, both of those goals make sense - right now we're clocking in at roughly double the memory usage of FreeType called from C/C++. See #47. To summarize, FreeType maintains an internal list of pointers and expects the exact memory address back. Duplicating a face and sending it back that pointer results in an error. The solution I came up with was to store both the pointer and a managed instance of the equivalent struct (that would only get updated when a function call mutates the native copy).
Yeah, things are a little weird for me right now - ReleaseNET20
configuration alongside PCL (mostly working), and I'm getting rid of the existing LINQ code at the moment. I moved the ToGdipBitmap function out into a new SharpFont.GDI
library as well.
Microsoft discontinued support for all frameworks besides .NET 3.5 and .NET 4.5.2.
Of the versions of .NET that are currently supported, LINQ is available in all of them. I guess it's less about LINQ itself and more that I'd prefer the minimum targeted version to be 3.5 instead of 2.0. LINQ just happened to be the only we're using that won't work on 2.0. Looking at it now, there's very little to gain from that bump up, and we're not gaining anything from LINQ, so I changed my mind.
I guess an analogy might help - if you were doing a major overhaul on a piece of software today that runs on Windows 98, would you keep supporting 98 or move the minimum requirements up to at least XP?
And yeah, no worries - I have generally been neglecting my open source work for the past year or so and I'm aware of that. I have a full time job lined up to start eventually (hoping within the next month? HR is painfully slow) and my freelance work will be scaled back a lot, the work I end up doing won't be super-top-priority like it is now
Cheers. Would you mind merging the repos at the same time as your current changes? That will create a permanent landing place for Harfbuzz pull requests going forward.
yeah, I'll commit what I have so far on the .NET 2.0 build and merge in HarfBuzz.
HarfBuzz merged in
Super, but think you missed the nuspec files. Also I think the current convention is to have the nuspec files live alongside the .csproj files rather than NuGet/Build - this allows properties from the .csproj to be referenced from the .nuspec.
@corliss do you know of any issues tracking the linq performance overhead in roslyn?
(Our https://github.com/neosmart/linqplus attempts to boost linq performance by specializing extension methods by data structure to provide a more efficient option than a naive IEnumerable<T>
would allow for, but it's like trying to drain an ocean with a pipet.)
great!
These are all the uses of Linq I could find in SharpFont:
TrueType\Header.cs:
return rec.Created.Select(x => (int)x).ToArray();
TrueType\Header.cs:return rec.Modified.Select(x => (int)x).ToArray();
FaceInfo.cs:return rec.xuid.Select(x => (uint)x).ToArray();
Library.cs:ParameterRec[] paramRecs = parameters.Select(x => x.Record).ToArray();
Fnt\Header.cs:return rec.reserved1.Select(x => (uint) x).ToArray();
These are trivial uses of Linq, that do the same thing: cast an array to a different type. It is easier, cleaner, and much more efficient, to do this without Linq.
Linq adds a lot of overhead. This is one of the main learnings of the C# community over the last many years, and even Microsoft ran into this when creating Roslyn - see https://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/DEV-B333. The relevant portion of the talk is around 36:30, but I recommend watching the entire talk - it can change everything about programming in .net.
In addition linq is hard to debug. Try walking the code above for each element. Compare that to debugging a simple
for
orforeach
loop.Note: I'm providing this as data and information, but don't wish to be drawn into an argument, don't have time for it. If you really believe that these lines are important enough to warrant a separate project with LinqBridge, as mentioned in #104, that's your choice. But a decision like this is about the level of programming competence.
tl;dr
Replacing this Linq code with plain C# makes it simpler, faster, with less dependencies.
Pop quiz
return rec.Select(x => (int)x).ToArray();
The plain C# implementation causes just one allocation - the output array.
How many allocations occur with the Linq implementation, on an input of, e.g., 7 items? (Hint - the CLR and libraries are open source.) It is really important to know this, to appreciate why Linq is such an issue.
Edit
Turns out to be even worse than I thought. A call to
SharpFont.TrueType.Header.Created
results in:which calls
SharpFont.TrueType.Internal.HeaderRec.Created
:which creates a 2-element
FT_Long
array on each invocation, and then the calling property uses Linq to convert that into anint
array!!! This happens on every invocation.These arrays are not even needed. Just return a struct with the two values from the inner property and you have zero allocations and zero method calls. Here's the same thing rewritten using tuples (you can use any struct if you don't want to use tuples.)
Contrast this to the present situation where you have a bewildering number of allocations and method calls just to return
Created
.I don't mean to sound harsh - this code was probably put together in haste without a good review. But it's worth understanding just how easy it is to do the wrong thing in C# and to compound code smells with band-aids that only seem attractive, but just mask the real problem.
Edit 2
Looks like there isn't any need to wrap the struct
SharpFont.TrueType.Internal.HeaderRec
with a classSharpFont.TrueType.Header
. Just expose the struct, and save yet another allocation, and a whole load of code duplication by eliminating the class entirely. This also reduces the above to just