Closed be5invis closed 8 years ago
compreffor is a mix of Python and C++, written by one of Behdad's intern at Google. https://groups.google.com/forum/#!searchin/fonttools/compreffor/fonttools/UhoRPlfkOPk/jr_esUoEr2UJ
It uses fonttools to extract the relevant parts of the CFF table for subroutinization, and then for storing the output back into the sfnt font, but the core algorithm is written in C++.
@anthrotype If there is a pure C/C++ library I will integrate it into otfcc. But currently I will focus on simple peephole optimizations.
There's a cffCompressor.h that exports a compreff
function, and a Makefile to compile a shared library.
The input data must be a string containing the CharStrings and FDSelect from the CFF's TopDict (cf. cxxCompressor.py#L87-L98).
The function returns an array containing the compressed subroutines and glyph encodings (cf. cxxCompressor.py#L150-L191)
I understand this is not in a state that it you could readily use, but it would be nice if you could help improving it. ;)
@anthrotype You know, maintaining multiple local subroutine tables is painful (for CID), therefore are does this library support exporting all subroutines into gsubrs
only?
Currently otfcc computes the charstrings and FDSelect into two cff_blob
s, which is a simple buffer data structure with a byte array and a length.
i'm afraid I don't know enough about CFF internals to understand what you mean.. Sorry 😔
@anthrotype The place that subroutines store can be either local or global. For CID fonts, each subfont (in FDArray) contain a local subroutine table. Exporting subroutines to the global subroutine table is much simpler to maintain.
@anthrotype What’s “Glyph Encoding”... Do you mean the encoding vector? It is currently unused in otfcc.
I actually have no idea 😁 BTW, there seems to be a lot of processing going on the Python end, even after the C++ library has returned its data. Integrating compreffor in otfcc might not be as simple as I initially thought.
Maybe it would be useful if there would be an option to store the original subroutines?
@schriftgestalt Well otfcc’s dumps store only outlines or glyph references. My main idea is that designers should not care about the storage details: otfcc will automatically create an optimized result for them.
But it should be possible to round trip as good as possible. If someone just wants to change some vertical metrics and the file gets much bigger because of the missing subroutines.
@schriftgestalt
That’s the purpose for the optimizer.
The purposed -O3
optimization level will turn on subroutinization, it will compress the CFF as much as possible.
If you manage to do that in a similar (or better) manner that makeOTF I’m fine with it.
@schriftgestalt I am not sure how compreffor perform, but it may be better than makeOTF.
@be5invis Let me summarize your requirement on optimization.
Given a list of charstring_il
, find the longest common substring of some of the charstring_il
s, such that s (longest common substring) times c (occurrences) is maximum.
How about I implement an function with following signature: charstring_il* lcs(uint32_t *out_length, charstring_il *in, uint32_t *in_length)
:
I think there's a lot of details I am missing...
@huntzhan
Objective: Extract one subroutine from the existing charstrings or subroutine definitions, which is defined as the most "valuable" common substrings in them. The value is calculated as (length - 2) × ((non-overlapping occurrences) - 1).
A program may be either a charstring or a subroutine. A charstring follows:
any* (rmoveto_operator | hmoveto_operator | vmoveto_operator) ((operand * | progid) operator special*)+ endchar_operator
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
while a subroutine follows
((operand * | progid) operator special*)+ return_operator
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When analyzing a program, you should only analyze the range with wavy underline above, using this pattern:
(operand * | progid) operator special*
as its basic unit.
Your extracted subroutine should follow this pattern either.
Interface: charstring_il extract_subroutine(charstring_il *in, uint32_t cs_length, uint32_t sr_length, uint16_t progid);
cs
: The array of input programs. The programs will be modified.cs_length
: the quantity of charstrings in in
.sr_length
: The quantity of existing subroutines in in
. cs_length
+ sr_length
== the length of in
.progid
: The subroutine ID of the subroutines.in
do not have positive value, return NULL
. Otherwise, return a new charstring_il
containing its content and a return
command (IL_TYPE_OPERATOR
with i
equal to op_return
), and replace all its occurrences in in
with a call stub (which is a IL_TYPE_PROGID
argument and a IL_TYPE_OPERATOR
with i
= op_callgsubr
).Limitations:
Close as complete/postponed
In this phase:
cff_blob
andcaryll_buffer
rmoveto
,rlineto
andrrcurveto
only, which makes @kenlunde’s SHS increased about 4MB after a dump-build cycle.)--ignore-glyph-order
will produce a 17285KB result.)-O
level to control optimization force.VORG
LTSH
BASE
.