chrissimpkins / recurse

Cross-platform recursive directory traversal file management tool
Apache License 2.0
2 stars 1 forks source link

Add support for canonically normalized Unicode form translations in matching/search sub-commands #8

Open chrissimpkins opened 4 years ago

chrissimpkins commented 4 years ago

Add optional support for translation of text file Unicode code points to either NFC (composed) or NFD (decomposed) normalized forms in commands that match on text contents of files. This will reduce text data encoding variation by establishing a standard Unicode code point sequence for composed characters that have fully composed and separate decomposed (e.g., mark and base form components of the character) code points that define the same canonical form. This support will allow for text match consistency across composed characters that are canonically equivalent forms by allowing the user to define the underlying code point format to use in their matches vs. the pattern that they use to define the match on the command line.

chrissimpkins commented 4 years ago

FAQ reference in Unicode documentation: https://www.unicode.org/faq/normalization.html