Closed JimBobSquarePants closed 2 years ago
Merging #281 (7438b74) into main (1b80769) will increase coverage by
1%
. The diff coverage is99%
.
@@ Coverage Diff @@
## main #281 +/- ##
======================================
+ Coverage 81% 83% +1%
======================================
Files 215 222 +7
Lines 11278 12200 +922
Branches 1757 1756 -1
======================================
+ Hits 9244 10167 +923
+ Misses 1607 1606 -1
Partials 427 427
Flag | Coverage Δ | |
---|---|---|
unittests | 83% <99%> (+1%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
...bors.Fonts/Unicode/Resources/BidiTrie.Generated.cs | 100% <ø> (ø) |
|
....Fonts/Unicode/Resources/GraphemeTrie.Generated.cs | 100% <ø> (ø) |
|
...Fonts/Unicode/Resources/LineBreakTrie.Generated.cs | 100% <ø> (ø) |
|
...rs.Fonts/Unicode/Resources/ScriptTrie.Generated.cs | 100% <ø> (ø) |
|
...Unicode/Resources/UnicodeCategoryTrie.Generated.cs | 100% <ø> (ø) |
|
src/SixLabors.Fonts/Unicode/UnicodeTrieBuilder.cs | 85% <50%> (ø) |
|
src/SixLabors.Fonts/Unicode/BidiClass.cs | 62% <100%> (ø) |
|
src/SixLabors.Fonts/Unicode/CodePoint.cs | 91% <100%> (+<1%) |
:arrow_up: |
src/SixLabors.Fonts/Unicode/JoiningClass.cs | 100% <100%> (ø) |
|
...s/Unicode/Resources/ArabicShapingTrie.Generated.cs | 100% <100%> (ø) |
|
... and 4 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 1b80769...7438b74. Read the comment docs.
Nice work @JimBobSquarePants
I'm not sure how do you get trie files, but I think it's better to keep source trie files in source control and document generation process. I understand that nobody want run file generation process on the build, but that process easy to forget, and maybe somebody would like to improve it, so better to have source files around. Or maybe I miss something in the changes ?
Thanks @Gillibald your idea to create classes was a great one!
@kant2002 The trie files are created via the Generator project and simply contained a header and the parsed bytes. That project reads a bunch of Unicode spec data files and creates the output.
Here's the project in the main
branch.
https://github.com/SixLabors/Fonts/tree/main/src/UnicodeTrieGenerator
If you look at Generator.cs
in this PR we're simply switching out the file generation, instead opting to create classes containing the trie data. All the source files required to create those classes are in the solution.
We definitely do not need to build them every time, though the process is pretty quick since the spec files are stored locally. We actually won't need to build these again until Unicode 15 is released and that will require a dedicated PR to do so there's no opportunity to forget.
Very nice! I assume this will also be easier to maintain than a locally referenced source generator, so we can shelf that idea for now. I think this approach makes sense given you're now leveraging this to also do some more custom preprocessing when generating the blobs, and not just serializing an already existing file. Very nice work James! 😄
@JimBobSquarePants thanks for explanation. Yeah, now it make sense to me. Thanks you for caring about such niche workloads :smile:
I will adapt your changes for Avalonia. Really appreciate your work.
Imma gonna merge this.
I just thought it'd be worth pointing out that the UnicodeTrie
-related code won't work on big endian systems (and will potentially fail or throw DataMisalignedException
on systems where aligned access is required). Something more along the lines of this would solve both of those issues:
struct UnicodeTrieHeader
{
public int HighStart;
public uint ErrorValue;
public int DataLength;
}
// ...
readonly uint[] data;
readonly int highStart;
readonly uint errorValue;
public UnicodeTrie(ReadOnlySpan<byte> rawData)
{
// MemoryMarshal.Read instead of MemoryMarshal.Cast/Unsafe.As
// because it performs an unaligned read and a bounds check.
var header = MemoryMarshal.Read<UnicodeTrieHeader>(rawData);
if (!BitConverter.IsLittleEndian)
{
header.HighStart = BinaryPrimitives.ReverseEndianness(header.HighStart);
header.ErrorValue = BinaryPrimitives.ReverseEndianness(header.ErrorValue);
header.DataLength = BinaryPrimitives.ReverseEndianness(header.DataLength);
}
highStart = header.HighStart;
errorValue = header.ErrorValue;
data = new uint[header.DataLength / sizeof(uint)];
// Avoid MemoryMarshal.Cast because we don't know if the rawData buffer is aligned
// to uint, which would potentially throw DataMisalignedException if it weren't.
rawData[Unsafe.SizeOf<UnicodeTrieHeader>()..].CopyTo(MemoryMarshal.AsBytes(data.AsSpan()));
if (!BitConverter.IsLittleEndian)
{
for (int i = 0; i < data.Length; i++)
{
data[i] = BinaryPrimitives.ReverseEndianness(data[i]);
}
}
}
We know that the buffer is aligned and we also know the endianness though? We create the data.
I wasn't able to find any actual usages of the ReadOnlySpan<byte>
constructor, so that wasn't immediately clear to me and it seemed like a potential oversight. If that's the case then you can disregard my points above.
The data is coming from the ReadOnlySpan<byte>
constants isn't it? That means that casting the bytes (which are most likely little-endian) into integers will break on big-endian.
Ooh. You're right! The endianness definitely matters since it's a compiled constant. I'll update unless you want to create a PR @DaZombieKiller ?
@JimBobSquarePants Sure! Opened one here: https://github.com/SixLabors/Fonts/pull/284
Prerequisites
Description
This PR does a couple of things.
I chose to remove the compression from the trie packing for performance since the binaries do not grow out of hand and the benefit is great. @Gillibald maybe this is something to consider for Avalonia?
There's no before/after benchmarks unfortunately because I added them as an afterthought, however I took a sampling snapshot early in the process. As you can see, loading Unicode data is no longer a hotspot.
Before
After
I don't want to go crazy optimizing much further in this PR but will have a good scan through the sampling data to see if there's some low hanging fruit at another time.
CC @Sergio0694 @kant2002 @0xced