DILCISBoard / E-ARK-CSIP

E-ARK Common Specification for Information Packages
http://earkcsip.dilcis.eu
Creative Commons Attribution 4.0 International
11 stars 5 forks source link

Add the ability to do file format and spec version detection based on magic numbers #388

Open jmaferreira opened 5 years ago

jmaferreira commented 5 years ago

It would be really nice if we could add the SIP, AIP and DIP specs to PRONOM and specify a magic number that would help implementers and software in general to detect the version of the spec that a particular package (in zip format) is compliant with

This would allow systems to auto-detect the format and version of, for example, SIPs submitted to ingest and run the appropriate parsers.

More info at: https://en.wikipedia.org/wiki/List_of_file_signatures

carlwilson commented 3 years ago

Interesting idea, it would have to be the equivalent of "container format" detection if it was to be from the zip file. This is similar to identifying MS Office documents wrapped as zips. I'm of the opinion that we shouldn't start by adding something to the package, but instead look to see if we can derive working signatures for what we have? @carlwilson to look at devising a PRONOM container sig that would work with DROID (and FIDO plus others) for the current SIP and DIP packages using the existing specs for end of current phase (31/10/2021). Removing milestone as not currently proposing a change to a specification.

carlwilson commented 10 months ago

I'm working my way through the issues, and this caught my eye. I have now seen this which is the way the ODF specification enforces this for package Zip archives. Section 3.3 seems relevant @jmaferreira https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part2-packages/OpenDocument-v1.3-os-part2-packages.html#__RefHeading__752809_826425813