This PR adds a script for Normalization Form C (NFC).
In NFC, characters are composed as much as possible. For example, in Unicode, an "e" with an acute accent (é) can be represented in two ways:
As a single precomposed character (é): U+00E9
As a combination of the letter "e" (U+0065) and the combining acute accent (U+0301)
The NFC script will transform the second form into the first, precomposed form.
I'm not entirely familiar with how this repo is structured, so I'm adding a standalone script to the scripts/ directory; its dependencies are added as optional dependencies in the pyproject file.
This PR adds a script for Normalization Form C (NFC).
In NFC, characters are composed as much as possible. For example, in Unicode, an "e" with an acute accent (é) can be represented in two ways:
U+00E9
U+0065
) and the combining acute accent (U+0301
)The NFC script will transform the second form into the first, precomposed form.
I'm not entirely familiar with how this repo is structured, so I'm adding a standalone script to the
scripts/
directory; its dependencies are added as optional dependencies in the pyproject file.