dompdf / php-font-lib

A library to read, parse, export and make subsets of different types of font files.
GNU Lesser General Public License v2.1
1.73k stars 255 forks source link

Performance improvements suggestion #32

Closed indreka closed 8 years ago

indreka commented 8 years ago

Loading in bigger ttf files is noticeably slow. After using xdebug profiler it was evident that most common function being called is BinaryStream->readUInt16()

When viewing the places that call it, there seem to be many loops which just call the function multiple times in a row. The function itself uses unpack, which allocates new array for each invocation and only the single value in it is used.

Proposal is to add a few helper functions to BinaryReader like

public function readUInt16Many($count) { $str = $this->read(2 * $count); return arrayvalues(unpack("n", $str)); } public function readInt16Many($count) { $str = $this->read(2 * $count); $valsuint16 = unpack("n", $str); $vals = array(); foreach ($vals_uint16 as $v) $vals[] = $v >= 0x8000 ? $v - 0x10000 : $v; return $vals; }

array_values is needed because when using * for unpack, the array indexes start from 1, instead of 0.

Then the helpers could be used for example in cmap, hmtx, kern, post parsers: $endCode = $font->readUInt16Many($segCount); $idRangeOffset = $font->readUInt16Many($segCount);

In addition there is a generic r function in BinaryStream that accepts array with type + count, that one should also be optimized to call readUInt16Many. Plus, the ->r(array()) cases could be even more optimized, since many of the calling places know to ask for uint16 array, so calling ->readUInt16Many($count); should also give a little boost, because it doesn't have to pass through the long switch list anymore.

When testing FreeSerif font with original code, the readUInt16() is called 76580 times, consuming 33% of the total font load time. After optimizations, the UInt16Many was called 10 times consuming 0.64% of time and readUInt16 11025 times consuming 13.21% of total time. Also ->r() calls were reduced from ~2800 to ~500.

bsweeney commented 8 years ago

Just curious, any metrics on how this affects the memory profile?

indreka commented 8 years ago

The total memory usage should remain the same because the end result in memory will remain the same, but it reduces the memory churn a lot by avoiding approximately 65 000 single-element array create+destroy cycles (for the FreeSerif font).

bsweeney commented 8 years ago

:+1:

Of course! Where's my head.

indreka commented 8 years ago

But, to reduce memory usage, I do have an approach that I have already tested. To the font file definition (in my test TrueType/File.php) I added presetSubset() function and getPresetSubset. I then modified glyf and hmtx classes, to only read in the needed values from the raw file. This sped up the processing of those tables over 10x and it also reduced memory load.

Main use case was Dompdf speedup, where the final subset of needed characters is known so there is no need to load in all the glyf and hmtx values just to output a subset font with 50 characters.

But I also had to modify the glyf a bit more to load dependency glyfs, since OutlineComposite objects depend on OutlineSimple objects. Those have to be loaded into the data object also, otherwise getGlyphIDs will fail because OutlineComposite->components[]->glyphIdex will point to missing entry later on.

indreka commented 8 years ago

Also remembered that the partial loading of glyf/hmtx required resorting of the processing of the entries in TrueType/File->parse to make sure cmap table is loaded at the beginning to know correctly which glyf/hmtx element corresponds to which character (for TrueType/File->getUnicodeCharMap), since I made getPresetSubset behave similar to setSubset, so that it maps unicode characters to the indexes and then returns those indexes.

bsweeney commented 8 years ago

Are the performance improvements covered by PRs #34, #35, and #36?

indreka commented 8 years ago

Yes, all the main improvements are sent in now. The partial loading approach I described is still raw and hackish (although it does give decent addiional performance boost) but I have to test it out a bit more.

In addition, to even use that approach, Dompdf needs a fix for subset string registration (strings modified by text-transform on output are initially registered by their raw values, not transformed ones). Subsetting works now because during final text frame rendering, the final (transformed) string is again registered for subset processing.

bsweeney commented 8 years ago

There are still some general issues with the way font subsetting is handled in dompdf (see dompdf/dompdf#750). I think it's more appropriately handled by the PDF rendering backend, in which case the transformed text subset would be correct.

Also relevenat to dompdf/dompdf#177.