OCFL / spec

The Oxford Common File Layout (OCFL) specifications
https://ocfl.io
52 stars 14 forks source link

unicode normalization #559

Open pwinckles opened 3 years ago

pwinckles commented 3 years ago

I was thinking about unicode normalization again. I know last time this was discussed, perhaps it was on Slack, that normalization was considered outside of the scope of the spec. However, I had a couple of additional thoughts after seeing that the BagIt spec spends time describing the normalization problem and then recommends that implementations tolerate differences in normalization and warn when there are files that differ by normal form only.

  1. Perhaps, it would make sense if OCFL validators produced warnings if there are files or object ids that only differ based on how they are normalized?
  2. Should the spec make any similar recommendations, perhaps in the implementation notes, about tolerating differences in normalization forms? Or is this not desirable behavior?
  3. The spec states "Each version block in each prior inventory file MUST represent the same object state as the corresponding version block in the current inventory file." In case of logical paths, is it up to the implementation to decide if this is a byte-for-byte comparison or a normalized comparison? (Edit: noting that digest algorithm changes are supported between versions.)
rosy1280 commented 2 years ago

We think discussion of this issue would be best in the implementation notes and are deferring to 2.0 because of the complications related to it.