Closed tomwrobel closed 11 months ago
This could be considered as an additional kind of fixity check - the file should be x bytes in size - but I suspect I'm pushing the definition of the word 'fixity' here.
2023-06-01 Editors' discussion -- This could be done within the current specification by creating an extension that defines (as mentioned in https://github.com/OCFL/spec/issues/629#issuecomment-1543788276) a new fixity type, perhaps called size
, that is simply the file size.
@zimeon should I make a pull request against https://github.com/OCFL/extensions/blob/main/docs/0001-digest-algorithms.md ?
Yes, the process is outlined in https://github.com/OCFL/extensions/blob/main/docs/0001-digest-algorithms.md#maintenance -- because we are not versioning extensions the PR should create a new digest algorithms extension that obsoletes 0001
Spun out to https://github.com/OCFL/extensions/issues/64
The implication of size
as a fixity digest algorithm is that collisions in fixity entries are not only unlikely, they may even be expected. I'm wondering if this represents a significant enough change in how implementers should treat the fixity block to warrant further discussion.
Interesting question @srerickson. My feeling is that it doesn't represent a major change in how fixity should be used but I'd love to hear other thoughts. I just created a new fixture suggestion of an object that has two different files with the same MD5 value: https://github.com/OCFL/fixtures/pull/107 . Implementations have to deal with this possibility even without extension digests that might be even weaker than currently specified digests.
@zimeon that fixture is really helpful thanks! This issue has helped me identify a problem in my own implementation where fixture collisions are treated as an error condition instead of being handled gracefully.
I don't mean to belabor the point, but I wonder if the implementation notes could address collisions a bit better. From this discussion, a key difference between fixity and manifest digests is that manifest digests are assumed to be collision-free, whereas collisions in fixity digests should be expected and handled gracefully. This point doesn't come across very clearly in the current fixity section which, instead, focuses on content addressability and tampering.
2023-07-06 Editors' discussion - we agree that it would be helpful to add a note to the fixity section of the Implementation Notes pointing out that fixity algorithms may generate the same value for different file content
algorithm extension has a PR that has been submitted and is being reviewed
Following on from, but not necessarily looking to revive: https://github.com/OCFL/spec/issues/474
It would be very useful for a repository manager to know how big an OCFL object and its component binary files are on disk. It affects a lot of decisions we're likely to make regarding how to handle the object and its component files.
Given the processing work required to generate the checksum, it seems like an opportunity to include the file size of a binary file represented by a given checksum. A key akin to the 'fixity' key, containing an array of key value pairs, might allow this, e.g.