asciidoctor / asciidoctor-pdf

:page_with_curl: Asciidoctor PDF: A native PDF converter for AsciiDoc based on Asciidoctor and Prawn, written entirely in Ruby.
https://docs.asciidoctor.org/pdf-converter/latest/
MIT License
1.13k stars 500 forks source link

Decompose characters using NFD normalization when performing smallcaps transformation #2485

Closed mojavelinux closed 4 months ago

mojavelinux commented 5 months ago

The smallcaps transformation is inherently limited to ASCII letters. However, it's possible to extend this transformation to all Latin letters (those with a diacritical mark) by first decomposing each character using NFD normalization (the decomposition normalized form).

str = str.unicode_normalize :nfd unless str.ascii_only?

When this is done, the diacritical mark (such as an accent) is separated from the character, resulting in an ASCII letter followed by a combining character. The ASCII letter to be transformed and the combining character reapplied to the transformed letter.

As an example, the NFD form of é is \u0065 + \u0301. The smallcaps version is \u1d07 + \u0301, which renders as ᴇ́.

mojavelinux commented 5 months ago

This is potentially a breaking change (since it requires additional font support and causes unforeseen side effects) and thus should only be done in the next major.