Decompose characters using NFD normalization when performing smallcaps transformation

asciidoctor / asciidoctor-pdf

:page_with_curl: Asciidoctor PDF: A native PDF converter for AsciiDoc based on Asciidoctor and Prawn, written entirely in Ruby.

MIT License

1.13k stars 500 forks source link

The smallcaps transformation is inherently limited to ASCII letters. However, it's possible to extend this transformation to all Latin letters (those with a diacritical mark) by first decomposing each character using NFD normalization (the decomposition normalized form).

str = str.unicode_normalize :nfd unless str.ascii_only?

When this is done, the diacritical mark (such as an accent) is separated from the character, resulting in an ASCII letter followed by a combining character. The ASCII letter to be transformed and the combining character reapplied to the transformed letter.

As an example, the NFD form of é is \u0065 + \u0301. The smallcaps version is \u1d07 + \u0301, which renders as ᴇ́.

asciidoctor / asciidoctor-pdf

Decompose characters using NFD normalization when performing smallcaps transformation #2485