Adding character maps - Getting the available characters.

phoenixenero commented 8 years ago

One of the difficulties of displaying character maps is the fact that you have to access the font files themselves in order to find how many (accessable) glyphs are there. Of course you can do the "dumb" solution and loop across Unicode values, but that would take a really long time, and would leave a lot of gaps.

I finished writing two Ruby scripts that will automate that function. They depend on the woff2sfnt command line tool, and the ttfunk gem.

get-available-chars.rb outputs a file containing all the available Unicode values with glyphs encoded onto them. It works by following these steps:

First, we create a temporary copy of the .WOFF file and convert into OpenType/TrueType using woff2sfnt. This is necessary in order for ttfunk to access the OpenType/TrueType tables.
Using ttfunk, we access the cmap (character map) table and get the Unicode keys.
We then output them on a file as a comma-delimited list. Since we have finished getting the Unicode values, we also delete the temporary font copy.

get-available-chars.rb takes two options:

The file name of the font
The extension of the output files. By default, it's set to ".unc"

glob-get-available-chars.rb is a wrapper for get-available-chars.rb. This script takes two options:

The glob pattern (ex: "*.woff")
The extension of the output files.

It subsequently loops through all the files which matches the pattern and runs get-available-chars.rb with the options above.

Here's the sample output (I dropped the files from my VM into Windows):

ss 2015-12-06 at 06 24 50

From left-to-right: Heuristica Regular, Montserrat Regular, Charter Regular

We can now use these values to generate a table containing each font's available characters.

The script can be run after each update through Github pages. To prevent the repository being bloated, we can add the Unicode character map file's extension (say, .unc) in the .gitignore.

I had some difficulties writing this due to my inexperience with Ruby, and ttfunk's glaring lack of documentation, but eventually it all worked out. I plan on making a PR which outputs the character map (if it exists) on the font catalog page.

Anyway, that's all for today!

Note: if you're hosting on Ubuntu, you can get the woff2sfnt tool through launchpad.ubuntu.org

Edit: There's actually a possibility of subsetting fonts being possible thanks to ttfunk, though I imagine that'll be a bit difficult to implement :)

alfredxing commented 8 years ago

This has definitely been on my to-do list for quite a while. I've been doing a bit of searching for tools to programmatically grab info from fonts, and came across https://github.com/behdad/fonttools/, which seems pretty robust and feature-filled. I think it's a part of the Google Fonts toolchain.

phoenixenero commented 8 years ago

Now that I think about it, that is better. That is what Google uses for subsets.

phoenixenero commented 8 years ago

So I was using the fonttools package. This command dumps the cmap tables of a font into a .ttx (XML) file.

ttx -t cmap Aileron-Black.woff

This seems to be a lot simpler than my previous workings, lol...

Though I haven't tested it yet, we can access the table data with the Nokogiri Ruby gem. Of course we can just use Regex, but that wouldn't be future-proof.

I will test this further. Will post results.

phoenixenero commented 8 years ago

Here's a new version of my script, now condensed into 1 .rb file: https://gist.github.com/phoenixenero/c8d40a390bb1acabcf9c

It supports a -f flag, which takes in a file pattern glob (ex: *.otf, fonts/*.woff) as input and outputs the unicode character map of all of the files matched.

phoenixenero commented 8 years ago

ss 2015-12-07 at 10 41 50

Alright seems like my implementation is working! Outputs a table markup with the codes! I might need to sort these though, seems like the .charmap generation is unsorted.

Edit: It's apparently it's not uncommon for fonts to have multiple cmap tables, will correct.

alfredxing / brick

Adding character maps - Getting the available characters. #132