OCFL / extensions

OCFL Community Extensions
6 stars 12 forks source link

Adding more digest algorithm #78

Open alvinsw opened 3 months ago

alvinsw commented 3 months ago

I could not find any discussion or PR proposing the extension 0009, so I kind of missed it. There is a need to pre-calculate CRC32 checksum for all the content files and I think the best way to do that is to add the checksums to the fixity section. In addition to the checksum, we also need the file size. Can we revise extension 0009 so that it include both CRC32 and CRC32+size combination?

rosy1280 commented 3 months ago

@alvinsw ext 0009 already allows users to use the file size as a fixity algorithm in the fixity block. It's labeled as size in the table in ext 0009.

You are also permitted to list multiple algorithms in the fixity block. There is an example in section 3.5.4 of the specification that shows how you would do this.

Given that file size is already an option and that you are permitted to list more than one algorithm in the fixity block, it sounds like all we would need to do is add CRC32 as a fixity algorithm to ext 0009 to solve your use case. Is that correct?

zimeon commented 3 months ago

History: Issue to add size https://github.com/OCFL/extensions/issues/64 that lead to PR https://github.com/OCFL/extensions/pull/65

I note that if we "update" 0009 we create a new extension that obsoletes 0009, as 0009 did for 0001.

@alvinsw - Can you say a bit more about your application of CRC32? Also, since there are many versions of the CRC checks, can you link to the specification of the one you propose for CRC32

alvinsw commented 2 months ago

@rosy1280 : Yes, that is correct . The size is already there, so we just need to add the crc32.

@zimeon : The use case that I encountered at the moment is to zip files dynamically using NGINX mod zip. The module requires the crc32 checksum to be known before the operation in order to support the Range header. I believe the best way to pre-calculate and store the checksum is to put them in the inventory.json. Yes, there are many versions of CRC so that's kind of problematic. Do we want to support all version of CRC? I could not find the official specification of the CRC, but the one that mod zip needs is the same as the one used by String::CRC32 perl module. Do you know if there an authoritative site that listed all the crc variants spec?