digital-preservation / droid

DROID (Digital Record and Object Identification)
BSD 3-Clause "New" or "Revised" License
279 stars 75 forks source link

Folder as a path behavior in container signatures #871

Open thorsted opened 1 year ago

thorsted commented 1 year ago

In a container signature a folder path can be used without a file, only if the folder is empty and the whole path is used.

SIARD for example uses:

<ContainerSignature ContainerType="ZIP" Id="31020">
      <Description>SIARD 2.1</Description>
      <Files>
        <File>
          <Path>header/siardversion/2.1/</Path>
        </File>
      </Files>
    </ContainerSignature>

Because the folder 2.1 is empty, the signature works, if it were to contain a file it would not work.

Example format EIP has a folder "CaptureOne/Settings120/" within the ZIP container, but also includes a file with a variable name and extension which can't be used for identification. "CaptureOne/Settings120/" does not work as a file path for identification. Removing the contents of the folder allows the identification to happen.

Expected behavior would allow for use of a root folder or full folder path without a filename for identification. EIP-Test.zip

tnafrancesca commented 1 year ago

A question I have is if this is put into other file format identification tools does it work?

thorsted commented 1 year ago

A question I have is if this is put into other file format identification tools does it work?

I did test in Siegfried, I extended the main signature with my container signature and got the same results as DROID.

siegfried   : 1.9.6
scandate    : 2023-01-20T08:33:34-07:00
signature   : default.sig
created     : 2023-01-20T08:33:20-07:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V109.xml; container-signature-20221102.xml; extensions: EIP-signature-file-v1-10-Jan-23.xml; container extensions: EIP-BYUdev1-signaturefile-20230110.xml'
---
filename : 'orig-EIP.eip'
filesize : 7374865
modified : 2023-01-11T11:21:03-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'x-fmt/263'
    format  : 'ZIP Format'
    version : 
    mime    : 'application/zip'
    basis   : 'byte match at [[0 4] [7374778 3] [7374843 4]] (signature 1/2)'
    warning : 'extension mismatch'
Thorsted:EIP Test thorsted$ sf fake-EIP.eip 
---
siegfried   : 1.9.6
scandate    : 2023-01-20T08:33:44-07:00
signature   : default.sig
created     : 2023-01-20T08:33:20-07:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V109.xml; container-signature-20221102.xml; extensions: EIP-signature-file-v1-10-Jan-23.xml; container extensions: EIP-BYUdev1-signaturefile-20230110.xml'
---
filename : 'fake-EIP.eip'
filesize : 6699973
modified : 2023-01-19T14:01:48-07:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'BYUdev/1'
    format  : 'Enhanced Image Package'
    version : 
    mime    : 'application/x-captureone'
    basis   : 'extension match eip; container name CaptureOne/Settings120/ with name only; name manifest.xml with byte match at 45, 9 (signature 1/2)'
    warning : 
richardlehane commented 1 year ago

suspect this is because there is no separate entry within the zip file for the directory & the directory really only exists as part of the path name of the file within it. If patterns for variably named container paths are added (https://github.com/digital-preservation/pronom/issues/10), a workaround for this would be to give a pattern like "CaptureOne/Settings120/*".