Open phoenixenero opened 8 years ago
This has definitely been on my to-do list for quite a while. I've been doing a bit of searching for tools to programmatically grab info from fonts, and came across https://github.com/behdad/fonttools/, which seems pretty robust and feature-filled. I think it's a part of the Google Fonts toolchain.
Now that I think about it, that is better. That is what Google uses for subsets.
So I was using the fonttools
package. This command dumps the cmap
tables of a font into a .ttx (XML) file.
ttx -t cmap Aileron-Black.woff
This seems to be a lot simpler than my previous workings, lol...
Though I haven't tested it yet, we can access the table data with the Nokogiri Ruby gem. Of course we can just use Regex, but that wouldn't be future-proof.
I will test this further. Will post results.
Here's a new version of my script, now condensed into 1 .rb file: https://gist.github.com/phoenixenero/c8d40a390bb1acabcf9c
It supports a -f
flag, which takes in a file pattern glob (ex: *.otf
, fonts/*.woff
) as input and outputs the unicode character map of all of the files matched.
Alright seems like my implementation is working! Outputs a table markup with the codes! I might need to sort these though, seems like the .charmap generation is unsorted.
Edit: It's apparently it's not uncommon for fonts to have multiple cmap
tables, will correct.
One of the difficulties of displaying character maps is the fact that you have to access the font files themselves in order to find how many (accessable) glyphs are there. Of course you can do the "dumb" solution and loop across Unicode values, but that would take a really long time, and would leave a lot of gaps.
I finished writing two Ruby scripts that will automate that function. They depend on the
woff2sfnt
command line tool, and thettfunk
gem.get-available-chars.rb
outputs a file containing all the available Unicode values with glyphs encoded onto them. It works by following these steps:woff2sfnt
. This is necessary in order forttfunk
to access the OpenType/TrueType tables.ttfunk
, we access thecmap
(character map) table and get the Unicode keys.get-available-chars.rb
takes two options:glob-get-available-chars.rb
is a wrapper forget-available-chars.rb
. This script takes two options:It subsequently loops through all the files which matches the pattern and runs
get-available-chars.rb
with the options above.Here's the sample output (I dropped the files from my VM into Windows):
From left-to-right: Heuristica Regular, Montserrat Regular, Charter Regular
We can now use these values to generate a table containing each font's available characters.
The script can be run after each update through Github pages. To prevent the repository being bloated, we can add the Unicode character map file's extension (say,
.unc
) in the.gitignore
.I had some difficulties writing this due to my inexperience with Ruby, and
ttfunk
's glaring lack of documentation, but eventually it all worked out. I plan on making a PR which outputs the character map (if it exists) on the font catalog page.Anyway, that's all for today!
Note: if you're hosting on Ubuntu, you can get the
woff2sfnt
tool through launchpad.ubuntu.orgEdit: There's actually a possibility of subsetting fonts being possible thanks to
ttfunk
, though I imagine that'll be a bit difficult to implement :)