archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: File Extension identification tool doesn't result in good file identification event information in the METS #1145

Open sallain opened 4 years ago

sallain commented 4 years ago

Expected behaviour Running a file identification tool (Siegfried, Fido, or File Extension) should result in a file identification PREMIS event in the METS. Here's part of a good event, showing the date and time, tool used, outcome, and detail note.

<premis:eventType>format identification</premis:eventType>
<premis:eventDateTime>2020-03-09T21:15:44.380982+00:00</premis:eventDateTime>
<premis:eventDetailInformation>
   <premis:eventDetail>program="Siegfried"; version="1.8.0"</premis:eventDetail>
</premis:eventDetailInformation>
<premis:eventOutcomeInformation>
   <premis:eventOutcome>Positive</premis:eventOutcome>
   <premis:eventOutcomeDetail>
      <premis:eventOutcomeDetailNote>fmt/43</premis:eventOutcomeDetailNote>
   </premis:eventOutcomeDetail>
</premis:eventOutcomeInformation>

Current behaviour If you use File Extension as your file identification tool, not all available information is being written to the PREMIS event.

<premis:eventType>format identification</premis:eventType>
<premis:eventDateTime>2020-03-11T22:58:06+00:00</premis:eventDateTime>
<premis:eventDetailInformation>
   <premis:eventDetail>program="File Extension"; version="0.1"</premis:eventDetail>
</premis:eventDetailInformation>
<premis:eventOutcomeInformation>
   <premis:eventOutcome>Positive</premis:eventOutcome>
   <premis:eventOutcomeDetail>
      <premis:eventOutcomeDetailNote>No Matching Format</premis:eventOutcomeDetailNote>
   </premis:eventOutcomeDetail>
</premis:eventOutcomeInformation>

While the eventOutcome was Positive, the detail note says No Matching Format. However, we can tell from the stdout in the UI that File Extension did find a format:

Screen Shot 2020-03-11 at 4 11 09 PM

We can also see more formation information in the object's techMD:

<premis:format>
   <premis:formatDesignation>
      <premis:formatName>Bitmap</premis:formatName>
      <premis:formatVersion></premis:formatVersion>
   </premis:formatDesignation>
   <premis:formatRegistry>
      <premis:formatRegistryName>Archivematica Format Policy Registry</premis:formatRegistryName>
      <premis:formatRegistryKey>.bmp</premis:formatRegistryKey>
   </premis:formatRegistry>
</premis:format>

I'd guess that the above comes from File Extension, because Archivematica Format Policy Registry is used as the Registry Name, but I'm not sure.

Steps to reproduce

  1. Enable File Extension as the file ID command in Preservation Planning > Identification > Commands.
  2. Start a transfer using archivematica-sampledata/SampleTransfers/Images.
  3. Select Yes to the perform file identification prompt.
  4. Inspect the METS.

Your environment (version of Archivematica, operating system, other relevant details) At least 1.10.1 and up, though I suspect that this has been the case for a lot longer.


For Artefactual use:

Before you close this issue, you must check off the following:

sromkey commented 4 years ago

I'm marking this as low severity on the assumption that very few, if any, users are still using file extension to ID.

ross-spencer commented 4 years ago

Related to https://github.com/archivematica/Issues/issues/862