Different filesystems may represent the same file name with different
Unicode characters. For instance, on my Linux ext4 system, the name "ö"
is represented with the character U+00F6 (LATIN SMALL LETTER O WITH
DIAERESIS). In contrast, on MacOS, it is represented using decomposed
form: U+006F (LATIN SMALL LETTER O) followed by U+0308 (COMBINING
DIAERESIS).
Without normalization, paths containing these characters will be
incorrectly interpreted as added/deleted when moved to a different
filesystem, as the lookup in the manifest map is done by byte content
rather than normalized string.
Using Unicode NFC, provided by the unicode/norm package, we
can always store the normalized form of the file path and avoid these
issues.
Different filesystems may represent the same file name with different Unicode characters. For instance, on my Linux ext4 system, the name "ö" is represented with the character U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS). In contrast, on MacOS, it is represented using decomposed form: U+006F (LATIN SMALL LETTER O) followed by U+0308 (COMBINING DIAERESIS).
Without normalization, paths containing these characters will be incorrectly interpreted as added/deleted when moved to a different filesystem, as the lookup in the manifest map is done by byte content rather than normalized string.
Using Unicode NFC, provided by the unicode/norm package, we can always store the normalized form of the file path and avoid these issues.