artefactual-labs / a3m

Lightweight Archivematica — 8 less than a11m.
https://a3m.readthedocs.io
GNU Affero General Public License v3.0
9 stars 5 forks source link

Problem: user supplied SIP name is not included in preservation metadata #49

Open sevein opened 4 years ago

sevein commented 4 years ago

The API takes a SIP name. compress_aip uses it to name the AIP filename but it's not used anywhere else. It should be included in API responses at the very least.

jhsimpson commented 7 months ago

This issue is still present. One way the problem is apparent is when looking at the preservation metadata in the METS file inside the AIP that is produced by a3m.

steps to reproduce: create an aip with any content, using the --name commandline parameter, something like: python -m a3m --name="test" file://local/file.zip

the resulting aip will be named test-[UUID].7z Open the aip. Note that the top directory is named test-[UUID] just like the 7z file, without the 7z extension.
Now look at the METS file. It will contain a dmdSec that describes the AIP as an intellectual entity. It will look similar to this:

 <mets:dmdSec ID="dmdSec_2">
    <mets:mdWrap MDTYPE="PREMIS:OBJECT">
      <mets:xmlData>
        <premis:object xmlns:premis="http://www.loc.gov/premis/v3" xsi:type="premis:intellectualEntity" xsi:schemaLocation="http://www.loc.gov/premis/v3 http://www.loc.gov/standards/premis/v3/premis.xsd" version="3.0">
          <premis:objectIdentifier>
            <premis:objectIdentifierType>UUID</premis:objectIdentifierType>
            <premis:objectIdentifierValue>a1d70fc5-d772-46ab-b6f3-a705f444855d</premis:objectIdentifierValue>
          </premis:objectIdentifier>
          <premis:originalName>a1d70fc5-d772-46ab-b6f3-a705f444855d</premis:originalName>
        </premis:object>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:dmdSec>

note that the premis original name does not match the name of the aip, it is missing the 'test-' prefix.

Now look for a structMap near the bottom of the METS file, in particular the one with TYPE="physical". It will be similar to this:

<mets:structMap TYPE="physical" ID="structMap_1" LABEL="Archivematica default">
    <mets:div TYPE="Directory" LABEL="a1d70fc5-d772-46ab-b6f3-a705f444855d" DMDID="dmdSec_2">
      <mets:div TYPE="Directory" LABEL="objects">

The 2nd line there has a Directory. The LABEL is supposed to match the name of a real directory in the AIP, in this case the top directory. We saw above it was named test-a1d70fc5-d772-46ab-b6f3-a705f444855d, but in the structMap the label is just the UUID.

The metadata in the AIP METS file should be accurate and reflect the actual physical structure of the AIP.

sallain commented 7 months ago

Just some context for how the original name looks in an Archivematica AIP. My transfer was named nametest. It appears in the premis:originalname field:

<premis:originalName>nametest-24814933-b162-4676-abf2-a86d0067fe67</premis:originalName>

And in the structMap (for both the physical and submission documentation sections, in this example):

  <mets:structMap TYPE="physical" ID="structMap_1" LABEL="Archivematica default">
    <mets:div TYPE="Directory" LABEL="nametest-24814933-b162-4676-abf2-a86d0067fe67" DMDID="dmdSec_1">
      <mets:div TYPE="Directory" LABEL="objects">
        <mets:div TYPE="Item" LABEL="small.txt">
          <mets:fptr FILEID="file-d59e40bd-52ae-4ec7-b4a6-56a7b6a3d4b4"/>
        </mets:div>
        <mets:div TYPE="Directory" LABEL="submissionDocumentation">
          <mets:div TYPE="Directory" LABEL="transfer-nametest-4b9e2e1f-4f09-4afb-b28a-4759a5d62e63">
            <mets:div TYPE="Item" LABEL="METS.xml">
              <mets:fptr FILEID="file-c8113ac5-ab3d-40c1-b773-4fa7a497fc73"/>
            </mets:div>
          </mets:div>
        </mets:div>
      </mets:div>
    </mets:div>
  </mets:structMap>

The original name also appears throughout the METS file wherever a path is indicated, e.g. in tool outputs to indicate which file is being analyzed.