Closed ross-spencer closed 5 years ago
A downloaded Dataverse dataset has a dataset.json
metadata file associated with it. The Distributor might be set using the fields in that, e.g.
{
"multiple": true,
"typeClass": "compound",
"typeName": "distributor",
"value": [
{
"distributorAbbreviation": {
"multiple": false,
"typeClass": "primitive",
"typeName": "distributorAbbreviation",
"value": "DGIC"
},
"distributorAffiliation": {
"multiple": false,
"typeClass": "primitive",
"typeName": "distributorAffiliation",
"value": "Queen's University"
},
"distributorName": {
"multiple": false,
"typeClass": "primitive",
"typeName": "distributorName",
"value": "Data and Government Information Centre"
},
"distributorURL": {
"multiple": false,
"typeClass": "primitive",
"typeName": "distributorURL",
"value": "http://library.queensu.ca/webdoc "
}
}
]
},
Example: Liquor and Gambling in Manitoba 2013 [Canada]
{
"typeName": "distributor",
"multiple": true,
"typeClass": "compound",
"value": [
{
"distributorName": {
"typeName": "distributorName",
"multiple": false,
"typeClass": "primitive",
"value": "Gambling Research Exchange Ontario"
},
"distributorAbbreviation": {
"typeName": "distributorAbbreviation",
"multiple": false,
"typeClass": "primitive",
"value": "GREO"
},
"distributorURL": {
"typeName": "distributorURL",
"multiple": false,
"typeClass": "primitive",
"value": "http://www.greo.ca/"
},
"distributorLogoURL": {
"typeName": "distributorLogoURL",
"multiple": false,
"typeClass": "primitive",
"value": "http://www.greo.ca/en/images/structure/logo.svg"
}
}
]
},
{
"typeName": "distributor",
"multiple": true,
"typeClass": "compound",
"value": [
{
"distributorName": {
"typeName": "distributorName",
"multiple": false,
"typeClass": "primitive",
"value": "Canadian Centre for Justice Statistics"
},
"distributorAffiliation": {
"typeName": "distributorAffiliation",
"multiple": false,
"typeClass": "primitive",
"value": "Statistics Canada"
},
"distributorAbbreviation": {
"typeName": "distributorAbbreviation",
"multiple": false,
"typeClass": "primitive",
"value": "CCJS"
},
"distributorURL": {
"typeName": "distributorURL",
"multiple": false,
"typeClass": "primitive",
"value": "http://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&SDDS=3302"
}
}
]
},
The resulting METS looks like as follows:
<mets:dmdSec ID="dmdSec_25113" CREATED="2018-08-21T02:29:07" STATUS="original">
<mets:mdWrap MDTYPE="DDI">
<mets:xmlData>
<ddi:codebook xmlns:ddi="http://www.icpsr.umich.edu/DDI" version="2.5" xsi:schemaLocation="http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd">
<ddi:stdyDscr>
<ddi:citation>
<ddi:titlStmt>
<ddi:titl>A study of my afternoon drinks </ddi:titl>
<ddi:IDNo agency="doi">10.5072/FK2/6PPJ6Y</ddi:IDNo>
</ddi:titlStmt>
<ddi:rspStmt>
<ddi:AuthEnty>Tester, Archivematica</ddi:AuthEnty>
</ddi:rspStmt>
<ddi:distStmt>
<ddi:distrbtr>SP Dataverse Network</ddi:distrbtr>
</ddi:distStmt>
<ddi:verStmt>
<ddi:version date="2018-05-09T20:45:27Z" type="RELEASED">1.0</ddi:version>
</ddi:verStmt>
</ddi:citation>
<ddi:dataAccs>
<ddi:useStmt>
<ddi:restrctn>CC0 Waiver</ddi:restrctn>
</ddi:useStmt>
</ddi:dataAccs>
</ddi:stdyDscr>
</ddi:codebook>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
Another example from a Dataverse uploaded sample:
<mets:dmdSec ID="dmdSec_376986" CREATED="2018-08-21T21:52:35" STATUS="original">
<mets:mdWrap MDTYPE="DDI">
<mets:xmlData>
<ddi:codebook xmlns:ddi="http://www.icpsr.umich.edu/DDI" version="2.5" xsi:schemaLocation="http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd">
<ddi:stdyDscr>
<ddi:citation>
<ddi:titlStmt>
<ddi:titl>Botanical Test</ddi:titl>
<ddi:IDNo agency="doi">10.5072/FK2/8KDUHM</ddi:IDNo>
</ddi:titlStmt>
<ddi:rspStmt>
<ddi:AuthEnty>Admin, Dataverse</ddi:AuthEnty>
</ddi:rspStmt>
<ddi:distStmt>
<ddi:distrbtr>SP Dataverse Network</ddi:distrbtr>
</ddi:distStmt>
<ddi:verStmt>
<ddi:version date="2017-01-04T21:32:02Z" type="RELEASED">1.1</ddi:version>
</ddi:verStmt>
</ddi:citation>
<ddi:dataAccs>
<ddi:useStmt>
<ddi:restrctn>CC0 Waiver</ddi:restrctn>
</ddi:useStmt>
</ddi:dataAccs>
</ddi:stdyDscr>
</ddi:codebook>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
Distribtr is described here: https://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/field_level_documentation_files/schemas/codebook_xsd/elements/distrbtr.html
The organization designated by the author or producer to generate copies of the particular work including any necessary editions or revisions. Names and addresses may be specified and other archives may be co-distributors. A URI attribute is included to provide an URN or URL to the ordering service or download facility on a Web site.
The DDI seems to use the same data as in the 'publisher' field of the dataset.json
file: https://dataverse.scholarsportal.info/api/datasets/export?exporter=ddi&persistentId=hdl%3A10864/10402
For now, we can map the publisher into this field so it appears correctly (and not hard-coded) in the METS.
A longer term solution discussed with @joel-simpson will be to ask the Storage Service to download the DDI for all datasets using a url such as this: https://demodv.scholarsportal.info/api/datasets/export?exporter=ddi&persistentId=doi%3A10.5072/FK2/8KDUHM
(Note it is constructed using the API syntax)
And then for future METS we will reference the DDI itself (like we do for Bundles):
<mets:dmdSec ID="dmdSec_708134" CREATED="2018-08-21T02:33:55" STATUS="original">
<mets:mdRef LABEL="Drinks-ddi.xml" xlink:href="Drinks/Drinks-ddi.xml" MDTYPE="DDI" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
</mets:dmdSec>
This is out of scope for this work right now, so we will just correct the hard-coded string.
I tested this and can confirm that we no longer see the hard-coded value SP Dataverse Network
. Instead we see "Root Dataverse" (or <ddi:distrbtr>Root Dataverse</ddi:distrbtr>
) in the METS, which is the same field that one sees in the dataset.json -- "publisher": "Root Dataverse"
This aligns to the Dataverse Crosswalk, which also maps Dataset Publisher to DDI Codebook 2.5 field 2.1.4.1 distrbtr
.
(for future reference, it might be worth adding: Distributor details are optional Metadata fields in Dataverse. They map natively/directly to DDI. If a dataverse user fills them in today, they won't see those in their Archviematica METS. It might be worth including those in our mapping to METs in future.)
Expected behaviour
The DDI spec specifies a
distrbtr
field with the following description:When this field is set in the Dataverse METS it should be set according to the source which may not be Scholars Portal.
Current behaviour
This line hard-codes this information: https://github.com/artefactual/archivematica/blob/08185735c03bca87a452cf43651e3112468b9a40/src/MCPClient/lib/clientScripts/dataverse.py#L37
Steps to reproduce
Begin a Dataverse transfer in the current cherrypick or transfer type branch, and the value will be set thus.