archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: Metadata Only Datasets from Dataverse fail #129

Open joel-simpson opened 6 years ago

joel-simpson commented 6 years ago

Expected behaviour When retreiving a dataset from Dataverse that only contains metadata (has no files), it should still be able to process in Archivematica to create an AIP. A metadata only dataset will contain a dataset.json file and an agents.json file, so it should be possible to carry out preservation actions on those files.

Current behaviour A dataset (e.g. this metadata only dataset) can be selected from a Dataverse Transfer Source, but it fails during the 'Verify Transfer Compliance' microservice, at the "Convert Dataverse Structure" job. The task output is:

/var/archivematica/sharedDirectory/currentlyProcessing/29AugMetadataOnly/ fe339733-ce23-40e8-8e95-f378efdf251c
Standard streams
Standard output (stdout)
Fields retrieved from Dataverse:
Title: Metadata only [test]
Author: Goodchild, Meghan
PID Type: doi
IDNO: https://doi.org/10.5072/FK2/MDQHYY
Version Date: 2018-05-23T17:26:15Z
Version Type: RELEASED
Version Number: 1.0
Restriction Text: CC0 Waiver
Distributor Text: Root Dataverse

Standard error (stderr)
convertdataversestructure_v0.0: INFO      2018-08-30 00:29:18,006  archivematica.mcp.client.convert_dataverse_struct.map_dataverse:462  Convert Dataverse Structure with dir args: '/var/archivematica/sharedDirectory/currentlyProcessing/29AugMetadataOnly/' and transfer uuid: fe339733-ce23-40e8-8e95-f378efdf251c
convertdataversestructure_v0.0: INFO      2018-08-30 00:29:18,007  archivematica.mcp.client.convert_dataverse_struct.map_:393  Convert Dataverse structure called with '/var/archivematica/sharedDirectory/currentlyProcessing/29AugMetadataOnly/' unit directory and 'fe339733-ce23-40e8-8e95-f378efdf251c' unit uuid
convertdataversestructure_v0.0: INFO      2018-08-30 00:29:18,007  archivematica.mcp.client.convert_dataverse_struct.load_md_and_return_json:379  Metadata directory exists True

Steps to reproduce Log in and attempt to process the dataset linked in above.

Your environment (version of Archivematica, OS version, etc) Testing the latest changes for Dataverse which are up to date with qa/1.x (Dev branch for [SS PR 398] (artefactual/archivematica-storage-service#398 AM PR 1242). Tested on my local docker dev environment and in our QA environment http://ocul-am-dv.dev.archivematica.org:62080

ross-spencer commented 6 years ago

JSON for ref:

{
    "id": 1110,
    "identifier": "MDQHYY",
    "persistentUrl": "https://doi.org/10.5072/FK2/MDQHYY",
    "protocol": "doi",
    "authority": "10.5072/FK2",
    "publisher": "Root Dataverse",
    "publicationDate": "2018-05-23",
    "datasetVersion": {
        "id": 289,
        "versionNumber": 1,
        "versionMinorNumber": 0,
        "versionState": "RELEASED",
        "productionDate": "Production Date",
        "lastUpdateTime": "2018-05-23T17:26:15Z",
        "releaseTime": "2018-05-23T17:26:15Z",
        "createTime": "2018-05-23T17:20:19Z",
        "license": "CC0",
        "termsOfUse": "CC0 Waiver",
        "metadataBlocks": {
            "citation": {
                "displayName": "Citation Metadata",
                "fields": [{
                        "typeName": "title",
                        "multiple": false,
                        "typeClass": "primitive",
                        "value": "Metadata only [test]"
                    },
                    {
                        "typeName": "author",
                        "multiple": true,
                        "typeClass": "compound",
                        "value": [{
                            "authorName": {
                                "typeName": "authorName",
                                "multiple": false,
                                "typeClass": "primitive",
                                "value": "Goodchild, Meghan"
                            },
                            "authorAffiliation": {
                                "typeName": "authorAffiliation",
                                "multiple": false,
                                "typeClass": "primitive",
                                "value": "Queen's University"
                            }
                        }]
                    },
                    {
                        "typeName": "datasetContact",
                        "multiple": true,
                        "typeClass": "compound",
                        "value": [{
                            "datasetContactName": {
                                "typeName": "datasetContactName",
                                "multiple": false,
                                "typeClass": "primitive",
                                "value": "Goodchild, Meghan"
                            },
                            "datasetContactAffiliation": {
                                "typeName": "datasetContactAffiliation",
                                "multiple": false,
                                "typeClass": "primitive",
                                "value": "Queen's University"
                            },
                            "datasetContactEmail": {
                                "typeName": "datasetContactEmail",
                                "multiple": false,
                                "typeClass": "primitive",
                                "value": "meghan.goodchild@queensu.ca"
                            }
                        }]
                    },
                    {
                        "typeName": "dsDescription",
                        "multiple": true,
                        "typeClass": "compound",
                        "value": [{
                            "dsDescriptionValue": {
                                "typeName": "dsDescriptionValue",
                                "multiple": false,
                                "typeClass": "primitive",
                                "value": "Test dataset with only metadata"
                            },
                            "dsDescriptionDate": {
                                "typeName": "dsDescriptionDate",
                                "multiple": false,
                                "typeClass": "primitive",
                                "value": "2018-05-23"
                            }
                        }]
                    },
                    {
                        "typeName": "subject",
                        "multiple": true,
                        "typeClass": "controlledVocabulary",
                        "value": [
                            "Social Sciences"
                        ]
                    },
                    {
                        "typeName": "keyword",
                        "multiple": true,
                        "typeClass": "compound",
                        "value": [{
                            "keywordValue": {
                                "typeName": "keywordValue",
                                "multiple": false,
                                "typeClass": "primitive",
                                "value": "metadata"
                            }
                        }]
                    },
                    {
                        "typeName": "publication",
                        "multiple": true,
                        "typeClass": "compound",
                        "value": [{
                            "publicationCitation": {
                                "typeName": "publicationCitation",
                                "multiple": false,
                                "typeClass": "primitive",
                                "value": "Goodchild, Meghan, and Jennifer Zhao. “Sustainability Engineering Collection Assessment: A Mixed-Method Analysis.” Science & Technology Libraries 2.36 (2017): 153–169."
                            }
                        }]
                    },
                    {
                        "typeName": "language",
                        "multiple": true,
                        "typeClass": "controlledVocabulary",
                        "value": [
                            "English"
                        ]
                    },
                    {
                        "typeName": "producer",
                        "multiple": true,
                        "typeClass": "compound",
                        "value": [{
                            "producerName": {
                                "typeName": "producerName",
                                "multiple": false,
                                "typeClass": "primitive",
                                "value": "Goodchild, Meghan"
                            },
                            "producerAffiliation": {
                                "typeName": "producerAffiliation",
                                "multiple": false,
                                "typeClass": "primitive",
                                "value": "Queen's University"
                            },
                            "producerURL": {
                                "typeName": "producerURL",
                                "multiple": false,
                                "typeClass": "primitive",
                                "value": "http://library.queensu.ca/"
                            }
                        }]
                    },
                    {
                        "typeName": "productionDate",
                        "multiple": false,
                        "typeClass": "primitive",
                        "value": "2018-05-23"
                    },
                    {
                        "typeName": "productionPlace",
                        "multiple": false,
                        "typeClass": "primitive",
                        "value": "Kingston, ON"
                    },
                    {
                        "typeName": "grantNumber",
                        "multiple": true,
                        "typeClass": "compound",
                        "value": [{
                            "grantNumberAgency": {
                                "typeName": "grantNumberAgency",
                                "multiple": false,
                                "typeClass": "primitive",
                                "value": "SSHRC"
                            }
                        }]
                    },
                    {
                        "typeName": "depositor",
                        "multiple": false,
                        "typeClass": "primitive",
                        "value": "Goodchild, Meghan"
                    },
                    {
                        "typeName": "dateOfDeposit",
                        "multiple": false,
                        "typeClass": "primitive",
                        "value": "2018-05-23"
                    }
                ]
            },
            "geospatial": {
                "displayName": "Geospatial Metadata",
                "fields": []
            },
            "journal": {
                "displayName": "Journal Metadata",
                "fields": []
            }
        },
        "files": [],
        "citation": "Goodchild, Meghan, 2018, \"Metadata only [test]\", https://doi.org/10.5072/FK2/MDQHYY, Root Dataverse, V1"
    }
}