IISH / archivematica

Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.
http://www.archivematica.org
GNU Affero General Public License v3.0
0 stars 0 forks source link

Problem: There isn't a way to supply third-party identifiers to Archivematica as PREMIS or to result in a PREMIS entry #1

Open lwo opened 6 years ago

lwo commented 6 years ago

We want to be able to supply our own persistent identifiers. Can you devise a method to do this. For example, maybe using a json file that contains the PID value, filename and type of the identifier?

ross-spencer commented 6 years ago

Via . https://projects.artefactual.com/issues/6261

Example dc json:

[
  {
    "dc.description.scholarlevel": "Faculty",
    "dc.description.affiliation": "University of Illinois at Urbana",
    "dc.description.abstract": "",
    "dc.language.iso": "eng",
    "dc.date.issued": "2010-03-23T14:03",
    "dc.type": "Moving Image",
    "dc.title": "A Lecture on hypergraphs",
    "dc.identifier": "XXXXX-Identifier-XXX",
    "filename": "objects/file.mov",
    "dc.subject": [
      "Combinatorics", "Digital Preservation" 
    ],
    "dc.relation": "A Relation",
    "dc.description.reviewstatus": "Unreviewed",
    "dc.publisher": "Artefactual Systems Inc.",
    "dc.format.extent": "30 minutes",
    "dc.contributor.author": "Author, Name" 
  }
]
ross-spencer commented 6 years ago

Example METS with Identidiers:

<mets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd" OBJID="ARCH00152.dig354">
  <fileSec>
    <fileGrp USE="transcription alto text" ID="fileGrp-1">
      <file MIMETYPE="application/xml" ID="f1">
        <FLocat xlink:href="file:///ARCH00152.dig354/transcription transcript/ARCH00152_355_0000.xml" LOCTYPE="URL"/>
        <FLocat xlink:href="http://hdl.handle.net/10622/3BF316F5-E00B-4148-B1ED-43EA61EFA263" LOCTYPE="HANDLE"/>
      </file>
      <file MIMETYPE="application/xml" ID="f2">
        <FLocat xlink:href="file:///ARCH00152.dig354/transcription transcript/ARCH00152_355_0001.xml" LOCTYPE="URL"/>
        <FLocat xlink:href="http://hdl.handle.net/10622/1015D0BE-37A0-413B-8301-934107FA1E58" LOCTYPE="HANDLE"/>
      </file>
    </fileGrp>
    <fileGrp USE="archive image" ID="fileGrp-2">
      <file MIMETYPE="image/tiff" ID="f6">
        <FLocat xlink:href="file:///ARCH00152.dig354/archive ARCH00152_355_0000.tif" LOCTYPE="URL"/>
        <FLocat xlink:href="http://hdl.handle.net/10622/65D29B75-A556-4E8D-ADFB-82CD74E3DCBD" LOCTYPE="HANDLE"/>
      </file>
      <file MIMETYPE="image/tiff" ID="f7">
        <FLocat xlink:href="file:///ARCH00152.dig354/archive ARCH00152_355_0001.tif" LOCTYPE="URL"/>
        <FLocat xlink:href="http://hdl.handle.net/10622/D2B748FD-6A36-4A54-BB2D-CD8E27C2BF7E" LOCTYPE="HANDLE"/>
      </file>
    </fileGrp>
  </fileSec>
  <structMap TYPE="physical">
    <div>
      <div TYPE="page" ORDER="1" LABEL="Page 1">
        <fptr FILEID="f1"/>
        <fptr FILEID="f6"/>
      </div>
      <div TYPE="page" ORDER="2" LABEL="Page 2">
        <fptr FILEID="f2"/>
        <fptr FILEID="f7"/>
      </div>
    </div>
  </structMap>
</mets>
ross-spencer commented 6 years ago

How does this look @kerim1 @lwo @IISH/artefactual:

[{
        "file": "objects/transcription/alto-text-0001.xml",
        "identifiers": [{
                "identifier": "file:///ARCH00152.dig354/transcription transcript/ARCH00152_355_0000.xml",
                "identiferType": "URL"
            },
            {
                "identifier": "http://hdl.handle.net/10622/3BF316F5-E00B-4148-B1ED-43EA61EFA263",
                "identiferType": "HANDLE"
            }
        ]
    },
    {
        "file": "objects/transcription/alto-text-0002.xml",
        "identifiers": [{
                "identifier": "file:///ARCH00152.dig354/transcription transcript/ARCH00152_355_0001.xml",
                "identiferType": "URL"
            },
            {
                "identifier": "http://hdl.handle.net/10622/1015D0BE-37A0-413B-8301-934107FA1E58",
                "identiferType": "HANDLE"
            }
        ]
    },
    {
        "file": "objects/ARCH00152_355_0000.tif",
        "identifiers": [{
                "identifier": "file:///ARCH00152.dig354/archive ARCH00152_355_0000.tif",
                "identiferType": "URL"
            },
            {
                "identifier": "http://hdl.handle.net/10622/65D29B75-A556-4E8D-ADFB-82CD74E3DCBD",
                "identiferType": "HANDLE"
            }
        ]
    },
    {
        "file": "objects/ARCH00152_355_0001.tif",
        "identifiers": [{
                "identifier": "file:///ARCH00152.dig354/archive ARCH00152_355_0001.tif",
                "identiferType": "URL"
            },
            {
                "identifier": "http://hdl.handle.net/10622/D2B748FD-6A36-4A54-BB2D-CD8E27C2BF7E",
                "identiferType": "HANDLE"
            }
        ]
    }
]
jhsimpson commented 6 years ago

1) when the 'bind pids?' question is answered 'yes': 2) check for existence of a file called 'metadata/identifiers.json' 3) if the file does not exist, follow the existing bind pids workflow 4) if the file does exist, read it and create identifiers from the data in the json file (instead of binding pids)

The 'filePath' variable in the identifiers.json file can refer to an individual file or folder within the transer, or to the entire transfer. When referring to a file, 'filePath' should be the relative path and filename for a file in the transfer (e.g. 'objects/ARCH00152_355_0001.tif'). When referring to a folder, use the path to a folder in the transfer (e.g. 'objects/folder1/'). To refer to the entire transfer, use 'objects/'.