alfredxing / brick

Open-source webfont service
http://brick.im
2.87k stars 96 forks source link

Adding character maps - Getting the available characters. #132

Open phoenixenero opened 8 years ago

phoenixenero commented 8 years ago

One of the difficulties of displaying character maps is the fact that you have to access the font files themselves in order to find how many (accessable) glyphs are there. Of course you can do the "dumb" solution and loop across Unicode values, but that would take a really long time, and would leave a lot of gaps.

I finished writing two Ruby scripts that will automate that function. They depend on the woff2sfnt command line tool, and the ttfunk gem.

get-available-chars.rb outputs a file containing all the available Unicode values with glyphs encoded onto them. It works by following these steps:

get-available-chars.rb takes two options:

glob-get-available-chars.rb is a wrapper for get-available-chars.rb. This script takes two options:

It subsequently loops through all the files which matches the pattern and runs get-available-chars.rb with the options above.

Here's the sample output (I dropped the files from my VM into Windows):

ss 2015-12-06 at 06 24 50

From left-to-right: Heuristica Regular, Montserrat Regular, Charter Regular

We can now use these values to generate a table containing each font's available characters.

The script can be run after each update through Github pages. To prevent the repository being bloated, we can add the Unicode character map file's extension (say, .unc) in the .gitignore.

I had some difficulties writing this due to my inexperience with Ruby, and ttfunk's glaring lack of documentation, but eventually it all worked out. I plan on making a PR which outputs the character map (if it exists) on the font catalog page.

Anyway, that's all for today!

Note: if you're hosting on Ubuntu, you can get the woff2sfnt tool through launchpad.ubuntu.org

Edit: There's actually a possibility of subsetting fonts being possible thanks to ttfunk, though I imagine that'll be a bit difficult to implement :)

alfredxing commented 8 years ago

This has definitely been on my to-do list for quite a while. I've been doing a bit of searching for tools to programmatically grab info from fonts, and came across https://github.com/behdad/fonttools/, which seems pretty robust and feature-filled. I think it's a part of the Google Fonts toolchain.

phoenixenero commented 8 years ago

Now that I think about it, that is better. That is what Google uses for subsets.

phoenixenero commented 8 years ago

So I was using the fonttools package. This command dumps the cmap tables of a font into a .ttx (XML) file.

ttx -t cmap Aileron-Black.woff

This seems to be a lot simpler than my previous workings, lol...

Though I haven't tested it yet, we can access the table data with the Nokogiri Ruby gem. Of course we can just use Regex, but that wouldn't be future-proof.

I will test this further. Will post results.

phoenixenero commented 8 years ago

Here's a new version of my script, now condensed into 1 .rb file: https://gist.github.com/phoenixenero/c8d40a390bb1acabcf9c

It supports a -f flag, which takes in a file pattern glob (ex: *.otf, fonts/*.woff) as input and outputs the unicode character map of all of the files matched.

phoenixenero commented 8 years ago

ss 2015-12-07 at 10 41 50

Alright seems like my implementation is working! Outputs a table markup with the codes! I might need to sort these though, seems like the .charmap generation is unsorted.

Edit: It's apparently it's not uncommon for fonts to have multiple cmap tables, will correct.