archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: Scholars Portal is hard-coded into Dataverse METS generation #59

Closed ross-spencer closed 5 years ago

ross-spencer commented 6 years ago

Expected behaviour

The DDI spec specifies a distrbtr field with the following description:

   <xs:element name="distrbtr" type="distrbtrType">
      <xs:annotation>
         <xs:documentation>
            <xhtml:div>
               <xhtml:h1 class="element_title">Distributor</xhtml:h1>
               <xhtml:div>
                  <xhtml:h2 class="section_header">Description</xhtml:h2>
                  <xhtml:div class="description">The organization designated by the author or producer to generate copies of the particular work including any necessary editions or revisions. Names and addresses may be specified and other archives may be co-distributors. A URI attribute is included to provide an URN or URL to the ordering service or download facility on a Web site. </xhtml:div>
               </xhtml:div>
               <xhtml:div>
                  <xhtml:h2 class="section_header">Example</xhtml:h2>
                  <xhtml:div class="example">
                     <xhtml:samp class="xml_sample"><![CDATA[
                        <distrbtr abbr="ICPSR" affiliation="Institute for Social Research" URI="http://www.icpsr.umich.edu">Ann Arbor, MI: Inter-university Consortium for Political and Social Research</distrbtr>
                     ]]></xhtml:samp>
                  </xhtml:div>
               </xhtml:div>
            </xhtml:div>
         </xs:documentation>
      </xs:annotation>
   </xs:element>

When this field is set in the Dataverse METS it should be set according to the source which may not be Scholars Portal.

Current behaviour

This line hard-codes this information: https://github.com/artefactual/archivematica/blob/08185735c03bca87a452cf43651e3112468b9a40/src/MCPClient/lib/clientScripts/dataverse.py#L37

Steps to reproduce

Begin a Dataverse transfer in the current cherrypick or transfer type branch, and the value will be set thus.

ross-spencer commented 6 years ago

A downloaded Dataverse dataset has a dataset.json metadata file associated with it. The Distributor might be set using the fields in that, e.g.

{
    "multiple": true,
    "typeClass": "compound",
    "typeName": "distributor",
    "value": [
        {
            "distributorAbbreviation": {
                "multiple": false,
                "typeClass": "primitive",
                "typeName": "distributorAbbreviation",
                "value": "DGIC"
            },
            "distributorAffiliation": {
                "multiple": false,
                "typeClass": "primitive",
                "typeName": "distributorAffiliation",
                "value": "Queen's University"
            },
            "distributorName": {
                "multiple": false,
                "typeClass": "primitive",
                "typeName": "distributorName",
                "value": "Data and Government Information Centre"
            },
            "distributorURL": {
                "multiple": false,
                "typeClass": "primitive",
                "typeName": "distributorURL",
                "value": "http://library.queensu.ca/webdoc "
            }
        }
    ]
},
ross-spencer commented 6 years ago

Example: Liquor and Gambling in Manitoba 2013 [Canada]

{
    "typeName": "distributor",
    "multiple": true,
    "typeClass": "compound",
    "value": [
        {
            "distributorName": {
                "typeName": "distributorName",
                "multiple": false,
                "typeClass": "primitive",
                "value": "Gambling Research Exchange Ontario"
            },
            "distributorAbbreviation": {
                "typeName": "distributorAbbreviation",
                "multiple": false,
                "typeClass": "primitive",
                "value": "GREO"
            },
            "distributorURL": {
                "typeName": "distributorURL",
                "multiple": false,
                "typeClass": "primitive",
                "value": "http://www.greo.ca/"
            },
            "distributorLogoURL": {
                "typeName": "distributorLogoURL",
                "multiple": false,
                "typeClass": "primitive",
                "value": "http://www.greo.ca/en/images/structure/logo.svg"
            }
        }
    ]
},

Example: Uniform Crime Reporting Incident-Based Survey (UCR): Detailed violations by age/sex of accuser/victim

{
    "typeName": "distributor",
    "multiple": true,
    "typeClass": "compound",
    "value": [
        {
            "distributorName": {
                "typeName": "distributorName",
                "multiple": false,
                "typeClass": "primitive",
                "value": "Canadian Centre for Justice Statistics"
        },
            "distributorAffiliation": {
                "typeName": "distributorAffiliation",
                "multiple": false,
                "typeClass": "primitive",
                "value": "Statistics Canada"
        },
            "distributorAbbreviation": {
                "typeName": "distributorAbbreviation",
                "multiple": false,
                "typeClass": "primitive",
                "value": "CCJS"
        },
            "distributorURL": {
                "typeName": "distributorURL",
                "multiple": false,
                "typeClass": "primitive",
                "value": "http://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&SDDS=3302"
        }
    }
    ]
},
ross-spencer commented 6 years ago

The resulting METS looks like as follows:

  <mets:dmdSec ID="dmdSec_25113" CREATED="2018-08-21T02:29:07" STATUS="original">
    <mets:mdWrap MDTYPE="DDI">
      <mets:xmlData>
        <ddi:codebook xmlns:ddi="http://www.icpsr.umich.edu/DDI" version="2.5" xsi:schemaLocation="http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd">
          <ddi:stdyDscr>
            <ddi:citation>
              <ddi:titlStmt>
                <ddi:titl>A study of my afternoon drinks </ddi:titl>
                <ddi:IDNo agency="doi">10.5072/FK2/6PPJ6Y</ddi:IDNo>
              </ddi:titlStmt>
              <ddi:rspStmt>
                <ddi:AuthEnty>Tester, Archivematica</ddi:AuthEnty>
              </ddi:rspStmt>
              <ddi:distStmt>
                <ddi:distrbtr>SP Dataverse Network</ddi:distrbtr>
              </ddi:distStmt>
              <ddi:verStmt>
                <ddi:version date="2018-05-09T20:45:27Z" type="RELEASED">1.0</ddi:version>
              </ddi:verStmt>
            </ddi:citation>
            <ddi:dataAccs>
              <ddi:useStmt>
                <ddi:restrctn>CC0 Waiver</ddi:restrctn>
              </ddi:useStmt>
            </ddi:dataAccs>
          </ddi:stdyDscr>
        </ddi:codebook>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:dmdSec>
ross-spencer commented 6 years ago

Another example from a Dataverse uploaded sample:

  <mets:dmdSec ID="dmdSec_376986" CREATED="2018-08-21T21:52:35" STATUS="original">
    <mets:mdWrap MDTYPE="DDI">
      <mets:xmlData>
        <ddi:codebook xmlns:ddi="http://www.icpsr.umich.edu/DDI" version="2.5" xsi:schemaLocation="http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd">
          <ddi:stdyDscr>
            <ddi:citation>
              <ddi:titlStmt>
                <ddi:titl>Botanical Test</ddi:titl>
                <ddi:IDNo agency="doi">10.5072/FK2/8KDUHM</ddi:IDNo>
              </ddi:titlStmt>
              <ddi:rspStmt>
                <ddi:AuthEnty>Admin, Dataverse</ddi:AuthEnty>
              </ddi:rspStmt>
              <ddi:distStmt>
                <ddi:distrbtr>SP Dataverse Network</ddi:distrbtr>
              </ddi:distStmt>
              <ddi:verStmt>
                <ddi:version date="2017-01-04T21:32:02Z" type="RELEASED">1.1</ddi:version>
              </ddi:verStmt>
            </ddi:citation>
            <ddi:dataAccs>
              <ddi:useStmt>
                <ddi:restrctn>CC0 Waiver</ddi:restrctn>
              </ddi:useStmt>
            </ddi:dataAccs>
          </ddi:stdyDscr>
        </ddi:codebook>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:dmdSec>
ross-spencer commented 6 years ago

Distribtr is described here: https://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/field_level_documentation_files/schemas/codebook_xsd/elements/distrbtr.html

The organization designated by the author or producer to generate copies of the particular work including any necessary editions or revisions. Names and addresses may be specified and other archives may be co-distributors. A URI attribute is included to provide an URN or URL to the ordering service or download facility on a Web site.

The DDI seems to use the same data as in the 'publisher' field of the dataset.json file: https://dataverse.scholarsportal.info/api/datasets/export?exporter=ddi&persistentId=hdl%3A10864/10402

For now, we can map the publisher into this field so it appears correctly (and not hard-coded) in the METS.

A longer term solution discussed with @joel-simpson will be to ask the Storage Service to download the DDI for all datasets using a url such as this: https://demodv.scholarsportal.info/api/datasets/export?exporter=ddi&persistentId=doi%3A10.5072/FK2/8KDUHM

(Note it is constructed using the API syntax)

And then for future METS we will reference the DDI itself (like we do for Bundles):

  <mets:dmdSec ID="dmdSec_708134" CREATED="2018-08-21T02:33:55" STATUS="original">
    <mets:mdRef LABEL="Drinks-ddi.xml" xlink:href="Drinks/Drinks-ddi.xml" MDTYPE="DDI" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
  </mets:dmdSec>

This is out of scope for this work right now, so we will just correct the hard-coded string.

joel-simpson commented 6 years ago

I tested this and can confirm that we no longer see the hard-coded value SP Dataverse Network. Instead we see "Root Dataverse" (or <ddi:distrbtr>Root Dataverse</ddi:distrbtr>) in the METS, which is the same field that one sees in the dataset.json -- "publisher": "Root Dataverse" This aligns to the Dataverse Crosswalk, which also maps Dataset Publisher to DDI Codebook 2.5 field 2.1.4.1 distrbtr.

(for future reference, it might be worth adding: Distributor details are optional Metadata fields in Dataverse. They map natively/directly to DDI. If a dataverse user fills them in today, they won't see those in their Archviematica METS. It might be worth including those in our mapping to METs in future.)