digital-preservation / droid

DROID (Digital Record and Object Identification)
BSD 3-Clause "New" or "Revised" License
280 stars 75 forks source link

Skeleton for fmt/1190 *.swc not matching in DROID #684

Open ross-spencer opened 2 years ago

ross-spencer commented 2 years ago

We have (as far as we can tell) a valid fmt/1190 skeleton file being output from the container suite generator. It isn't matching against DROID as anticipated with the latest signature files (handful of tests now in parallel with the latest skeleton suites).

image

Sample file and some discussion in this thread here

Further discussion here around sequence numbering starting from zero, instead of one. That particular issue might just affect Siegfried.

If you verify this issue for droid as well it might provide a starting point for y'all too.

<BinarySignatures>
  <InternalSignatureCollection>
    <InternalSignature ID="28200">
      <ByteSequence Reference="BOFoffset">
        <!-- First position listed as zero -->
        <SubSequence Position="0" SubSeqMinOffset="0" SubSeqMaxOffset="0">
          <Sequence>3C 3F 78 6D 6C 20 76 65 72 73 69 6F 6E 3D 22 31 2E 30 22 20 3F 3E</Sequence>
        </SubSequence>
      </ByteSequence>
      <ByteSequence Reference="BOFoffset">
        <!-- Second position listed as one -->
        <SubSequence Position="1" SubSeqMinOffset="23" SubSeqMaxOffset="50">
        <Sequence>3C 73 77 63 20 78 6D 6C 6E 73 3D 22 68 74 74 70 3A 2F 2F 77 77 77 2E 61 64 6F 62 65 2E 63 6F 6D 2F 66 6C 61 73 68 2F 73 77 63 63 61 74 61 6C 6F 67 2F</Sequence>
        </SubSequence>
      </ByteSequence>
    </InternalSignature>
  </InternalSignatureCollection>
</BinarySignatures>
Dclipsham commented 2 years ago

Sorry Ross I somehow missed this when posted. I'll have a proper look over the weekend and try to remind myself of our intent here. David

richardlehane commented 2 years ago

I just did a rebuild of Ross's signature suite and have hit some fresh issues with this signature that I posted here: https://github.com/exponential-decay/skeleton-container-test-suite-generator/issues/17

David, can you clarify if the intent is the anchor both sequences from the beginning of the file, or should the offsets for the second sequence be relative to the end of the first sequence? (in which case it is more normal to have two SubSequence elements within a single ByteSequence, rather than two ByteSequence elements, right?)