Some filesystems and utilities, most notable Apple's HFS+ filesystem, will transparently apply Unicode normalization to filenames. This means that the comparison between the list of files present on disk and the the list of files in the manifests must be performed with a Unicode-equivalence test to avoid confusing situations such as #51 where a visually-identical filename is reporting as non-existent because the encoded characters differ (see http://www.unicode.org/reports/tr15/).
To avoid backwards compatibility, bagit-python will not alter the values stored in the manifests or saved to disk. Instead, it will compare the two lists after applying a consistent normalization form which also allows raising an exception for attempts to bag files which differ only in normalization form (which is both user-hostile and likely to lead to data-loss).
This patch is somewhat more involved to maintain backwards compatibility with existing code. It might be time for another round of cleanup to the existing code structure, especially regarding Bag.entries and the way it's populated.
Some filesystems and utilities, most notable Apple's HFS+ filesystem, will transparently apply Unicode normalization to filenames. This means that the comparison between the list of files present on disk and the the list of files in the manifests must be performed with a Unicode-equivalence test to avoid confusing situations such as #51 where a visually-identical filename is reporting as non-existent because the encoded characters differ (see http://www.unicode.org/reports/tr15/).
To avoid backwards compatibility, bagit-python will not alter the values stored in the manifests or saved to disk. Instead, it will compare the two lists after applying a consistent normalization form which also allows raising an exception for attempts to bag files which differ only in normalization form (which is both user-hostile and likely to lead to data-loss).
This patch is somewhat more involved to maintain backwards compatibility with existing code. It might be time for another round of cleanup to the existing code structure, especially regarding
Bag.entries
and the way it's populated.See #51 See #81