digital-preservation / droid

DROID (Digital Record and Object Identification)
BSD 3-Clause "New" or "Revised" License
276 stars 74 forks source link

DROID 6.x not matching PNG file using custom signature file from Signature File Generator tool #35

Open ross-spencer opened 11 years ago

ross-spencer commented 11 years ago

The following PNG file:

89 50 4E 47 0D 0A 1A 0A 00 00 00 0D 49 48 44 52 00 00 00 D2 00 00 00 D2 08 02 00 00 00 B0 FB 09 15 00 00 00 13 74 45 58 74 54 69 74 6C 65 00 32 36 20 4D 61 72 63 68 20 32 30 31 33 F2 4A EE C0 00 00 00 13 74 45 58 74 41 75 74 68 6F 72 00 52 6F 73 73 20 53 70 65 6E 63 65 72 58 F1 CE DE 00 00 00 46 74 45 58 74 43 6F 70 79 72 69 67 68 74 00 43 72 65 61 74 69 76 65 20 43 6F 6D 6D 6F 6E 73 20 41 74 74 72 69 62 75 74 69 6F 6E 2D 53 68 61 72 65 41 6C 69 6B 65 20 33 2E 30 20 55 6E 70 6F 72 74 65 64 20 4C 69 63 65 6E 73 65 7B 01 65 F8 00 00 00 3C 74 45 58 74 53 6F 66 74 77 61 72 65 00 68 74 74 70 73 3A 2F 2F 67 69 74 68 75 62 2E 63 6F 6D 2F 65 78 70 6F 6E 65 6E 74 69 61 6C 2D 64 65 63 61 79 2F 62 69 6E 61 72 79 2D 6E 75 6D 62 65 72 73 CF B9 6A F6 00 00 01 84 49 44 41 54 78 9C ED DC B1 0D C2 50 10 05 41 4C 53 16 1D 91 D3 87 63 97 C4 AF CA 74 80 44 70 5A 09 CD 14 F0 82 D3 C6 77 BB 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30 6B 9B 9B BE AE A9 E5 E7 1A 9B 1E 73 EE 53 A7 3E 1E 6B 68 F9 F5 DE 87 96 EF 43 BB F0 85 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 8E 80 EC 08 C8 0E 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FE D0 36 37 7D 5D 53 CB CF 35 36 3D E6 DC A7 4E 7D 3C D6 D0 F2 EB BD 0F 2D FB 6F 47 40 76 04 64 47 40 76 04 64 47 40 76 04 64 47 40 76 04 64 47 40 76 04 64 47 40 76 04 64 47 40 76 04 64 47 40 76 04 64 47 40 76 04 64 47 40 76 04 64 47 40 76 04 64 47 40 76 04 64 47 40 76 04 64 47 40 76 04 64 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 F0 B3 0F C8 43 10 B3 9B 58 BC 7B 00 00 00 00 49 45 4E 44 AE 42 60 82

Is not matched when using the following signature file:

    <?xml version="1.0" encoding="UTF-8"?>
    <FFSignatureFile xmlns="http://www.nationalarchives.gov.uk/pronom/SignatureFile" Version="1" DateCreated="2013-08-27T22:48:58+01:00">
      <InternalSignatureCollection>
      <InternalSignature ID="1" Specificity="Specific">
      <ByteSequence Reference="BOFoffset">
         <SubSequence MinFragLength="0" Position="1" SubSeqMaxOffset="0" SubSeqMinOffset="0">
            <Sequence>89504E470D0A1A0A0000000D49484452</Sequence>
            <DefaultShift>17</DefaultShift>
            <Shift Byte="89">16</Shift>
            <Shift Byte="50">15</Shift>
            <Shift Byte="4E">14</Shift>
            <Shift Byte="47">13</Shift>
            <Shift Byte="0D">5</Shift>
            <Shift Byte="0A">9</Shift>
            <Shift Byte="1A">10</Shift>
            <Shift Byte="00">6</Shift>
            <Shift Byte="49">4</Shift>
            <Shift Byte="48">3</Shift>
            <Shift Byte="44">2</Shift>
            <Shift Byte="52">1</Shift>
         </SubSequence>
      </ByteSequence>
      <ByteSequence>
         <SubSequence MinFragLength="0" Position="1" SubSeqMinOffset="0">
            <Sequence>417574686F7200526F7373205370656E636572</Sequence>
            <DefaultShift>20</DefaultShift>
            <Shift Byte="41">19</Shift>
            <Shift Byte="75">18</Shift>
            <Shift Byte="74">17</Shift>
            <Shift Byte="68">16</Shift>
            <Shift Byte="6F">11</Shift>
            <Shift Byte="72">1</Shift>
            <Shift Byte="00">13</Shift>
            <Shift Byte="52">12</Shift>
            <Shift Byte="73">9</Shift>
            <Shift Byte="20">8</Shift>
            <Shift Byte="53">7</Shift>
            <Shift Byte="70">6</Shift>
            <Shift Byte="65">2</Shift>
            <Shift Byte="6E">4</Shift>
            <Shift Byte="63">3</Shift>
         </SubSequence>
      </ByteSequence>
      <ByteSequence Reference="EOFoffset">
         <SubSequence MinFragLength="0" Position="1" SubSeqMaxOffset="0" SubSeqMinOffset="0">
            <Sequence>0000000049454E44AE426082</Sequence>
            <DefaultShift>-13</DefaultShift>
            <Shift Byte="00">-1</Shift>
            <Shift Byte="49">-5</Shift>
            <Shift Byte="45">-6</Shift>
            <Shift Byte="4E">-7</Shift>
            <Shift Byte="44">-8</Shift>
            <Shift Byte="AE">-9</Shift>
            <Shift Byte="42">-10</Shift>
            <Shift Byte="60">-11</Shift>
            <Shift Byte="82">-12</Shift>
         </SubSequence>
      </ByteSequence>
    </InternalSignature>
    </InternalSignatureCollection>
    <FileFormatCollection>
      <FileFormat ID="1" Name="spencer-png" PUID="dev/1" Version="1.0" MIMEType="text/x-test-signature">
         <InternalSignatureID>1</InternalSignatureID>
         <Extension>ext</Extension>
      </FileFormat>
    </FileFormatCollection></FFSignatureFile>

The difference between this signature for PNG and the DROID standard signature for PNG is that I am looking for the Author metadata I have included in it also.

It is not clear to me what the issue is but it seems to lie with the DROID engine. The byte sequences are not complex and we're not using any of the DROID regular expression syntax to create them.

Testing the signature file in DROID 4.0 alongside a PNG without the additional metadata I am trying to match then the tool will correctly identify the PNG above and will not falsely identify the standard PNG.

Let me know if you require any further information.

Tested on DROID 6.1.3.

EDIT: When I merge these byte-sequences with a standard signature file and use an existing InternalSignature block there as a placeholder DROID seems to pick up this signature and then identifies the file I am looking at. As such, it looks like there is something about the way DROID is handling signatures from the generator tool.

Dclipsham commented 11 years ago

Hi Ross, Using both the signature file you have provided, the byte code of the png file, and DROID 6.1.3 I was able to identify the file without any difficulty or any modifications. I believe, as per your edit, that the problem here is partly the output from the Signature Generation Utility, but also the way DROID is handling it.

Please try the following: Under your .droid6 folder > Profile_templates>schema 6.03, please delete the file 'profile.1.template'. Next, re-upload the test signature file and start a new profile. You should now find your file identifies correctly using your test signature file.

When using the signature generation utility, the output always includes EDIT: '<InternalSignature ID="1"' 'FFSignatureFile xmlns="http://www.nationalarchives.gov.uk/pronom/SignatureFile" Version="1"' . When DROID uses a signature file generated in this way, it seems to use the existing profile template file (where one exists with the corresponding number) rather than overwrite. This appears to be causing a conflict which manifests itself in a couple of ways that I have witnessed so far - one as you've experienced - another where the file identified will be described as a previous format. (so I may generate a test signature for filetype .abc then in the future generate a test signature for filetype .xyz, but when scanning DROID reports the .xyz as an .abc even though the signature file in use makes no mention of .abc).

Assuming the workaround solves your immediate issue (and please let me know either way!), I expect we'll either need to tweak the output of the Signature Generation Utility, or force the profile_template file to be overwritten when a new signature file is uploaded (which presumably would mean an additional overhead on first run). Any insight you can offer would be greatly appreciated.

I hope to hear back soon.

David

ross-spencer commented 11 years ago

Thanks David. That seems to work. The workaround is acceptable for the small amounts of testing I am doing currently.

An easy way to manipulate the signature generation utility would be to adopt a UUID as the signature file ID - the problem there is that DROID isn't flexible enough to handle that approach. Tested with a v4 UUID.

Maintaining state in the signature file generation utility is not something I am keen on as its elegance at present (messiness of code aside) is that it does little more than string manipulation to output this useful file. Plus the tool has no awareness of the current DROID signature file numbers from PRONOM live, and certainly no awareness of any test instances you may be running or creating files from. The utility could create an alternative random number but this has the potential for clashes, or start from an as yet un-visited value, e.g. Signature File version 1000 onward, but, that does require keeping track of that on the server.

I would prefer to see a change in DROID as I can imagine I'm not the only one who will experience this issue, e.g. organisations already developing signatures for TNA, and further, anyone who might create a similar utility in future from a different tool using your open specifications.

As for overhead on first run, I presume re-generation of the profile-template file would only need to occur when one selects 'Install Signature File' or re-selects an alternative from the drop-down.

Just some thoughts. I will work with this for now. Thanks for the help.

For your information, I do host the code for the signature development utility here: https://github.com/exponential-decay/signature-development-utility

Obviously you guys have copies too. If you have any code you'd like to contribute back then I'm happy to see if I can include it in the utility from my side. Also happy to hear any other suggestions on easy changes I can make to improve this situation.

Dclipsham commented 11 years ago

Thanks Ross, I very much agree that the change is better suited with DROID and we'll add this to the development roadmap. David

paulyoung84 commented 8 years ago

This has been reviewed again recently and added to current DROID backlog.

Dclipsham commented 4 years ago

Should be fixed by resolving #13