caryll / otfcc

Optimized OpenType builder and inspector.
Apache License 2.0
422 stars 63 forks source link

Phase 4: CFF Continued #16

Closed be5invis closed 8 years ago

be5invis commented 8 years ago

In this phase:

anthrotype commented 8 years ago

compreffor is a mix of Python and C++, written by one of Behdad's intern at Google. https://groups.google.com/forum/#!searchin/fonttools/compreffor/fonttools/UhoRPlfkOPk/jr_esUoEr2UJ

It uses fonttools to extract the relevant parts of the CFF table for subroutinization, and then for storing the output back into the sfnt font, but the core algorithm is written in C++.

be5invis commented 8 years ago

@anthrotype If there is a pure C/C++ library I will integrate it into otfcc. But currently I will focus on simple peephole optimizations.

anthrotype commented 8 years ago

There's a cffCompressor.h that exports a compreff function, and a Makefile to compile a shared library. The input data must be a string containing the CharStrings and FDSelect from the CFF's TopDict (cf. cxxCompressor.py#L87-L98). The function returns an array containing the compressed subroutines and glyph encodings (cf. cxxCompressor.py#L150-L191)

I understand this is not in a state that it you could readily use, but it would be nice if you could help improving it. ;)

be5invis commented 8 years ago

@anthrotype You know, maintaining multiple local subroutine tables is painful (for CID), therefore are does this library support exporting all subroutines into gsubrs only? Currently otfcc computes the charstrings and FDSelect into two cff_blobs, which is a simple buffer data structure with a byte array and a length.

anthrotype commented 8 years ago

i'm afraid I don't know enough about CFF internals to understand what you mean.. Sorry 😔

be5invis commented 8 years ago

@anthrotype The place that subroutines store can be either local or global. For CID fonts, each subfont (in FDArray) contain a local subroutine table. Exporting subroutines to the global subroutine table is much simpler to maintain.

be5invis commented 8 years ago

@anthrotype What’s “Glyph Encoding”... Do you mean the encoding vector? It is currently unused in otfcc.

anthrotype commented 8 years ago

I actually have no idea 😁 BTW, there seems to be a lot of processing going on the Python end, even after the C++ library has returned its data. Integrating compreffor in otfcc might not be as simple as I initially thought.

schriftgestalt commented 8 years ago

Maybe it would be useful if there would be an option to store the original subroutines?

be5invis commented 8 years ago

@schriftgestalt Well otfcc’s dumps store only outlines or glyph references. My main idea is that designers should not care about the storage details: otfcc will automatically create an optimized result for them.

schriftgestalt commented 8 years ago

But it should be possible to round trip as good as possible. If someone just wants to change some vertical metrics and the file gets much bigger because of the missing subroutines.

be5invis commented 8 years ago

@schriftgestalt That’s the purpose for the optimizer. The purposed -O3 optimization level will turn on subroutinization, it will compress the CFF as much as possible.

schriftgestalt commented 8 years ago

If you manage to do that in a similar (or better) manner that makeOTF I’m fine with it.

be5invis commented 8 years ago

@schriftgestalt I am not sure how compreffor perform, but it may be better than makeOTF.

huntzhan commented 8 years ago

@be5invis Let me summarize your requirement on optimization.

Given a list of charstring_il, find the longest common substring of some of the charstring_ils, such that s (longest common substring) times c (occurrences) is maximum.

How about I implement an function with following signature: charstring_il* lcs(uint32_t *out_length, charstring_il *in, uint32_t *in_length):

I think there's a lot of details I am missing...

be5invis commented 8 years ago

@huntzhan

Objective: Extract one subroutine from the existing charstrings or subroutine definitions, which is defined as the most "valuable" common substrings in them. The value is calculated as (length - 2) × ((non-overlapping occurrences) - 1).

A program may be either a charstring or a subroutine. A charstring follows:

any* (rmoveto_operator | hmoveto_operator | vmoveto_operator) ((operand * | progid) operator special*)+ endchar_operator
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

while a subroutine follows

 ((operand * | progid) operator special*)+ return_operator
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When analyzing a program, you should only analyze the range with wavy underline above, using this pattern:

(operand * | progid) operator special*

as its basic unit.

Your extracted subroutine should follow this pattern either.

Interface: charstring_il extract_subroutine(charstring_il *in, uint32_t cs_length, uint32_t sr_length, uint16_t progid);

Limitations:

be5invis commented 8 years ago

Close as complete/postponed