IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 485 forks source link

"Compact"/Make complete the native API output #3068

Closed raprasad closed 11 months ago

raprasad commented 8 years ago

This is probably worth a separate FRD and includes (certainly not limited to) items such as:

This is not a "massive" project but modifications on the existing code.

raprasad commented 8 years ago

Example of metadata blocks for this dataset: http://hdl.handle.net/1902.29/10220

Current version (12,172 bytes)

"metadataBlocks": {
            "citation": {
                "displayName": "Citation Metadata", 
                "fields": [
                    {
                        "typeName": "title", 
                        "multiple": false, 
                        "typeClass": "primitive", 
                        "value": "North Carolina Vital Statistics -- Birth/Infant Deaths 1976"
                    }, 
                    {
                        "typeName": "author", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "authorName": {
                                    "typeName": "authorName", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "State Center for Health Statistics"
                                }
                            }
                        ]
                    }, 
                    {
                        "typeName": "datasetContact", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "datasetContactName": {
                                    "typeName": "datasetContactName", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "David Sheaves"
                                }, 
                                "datasetContactEmail": {
                                    "typeName": "datasetContactEmail", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "david_sheaves@unc.edu"
                                }
                            }
                        ]
                    }, 
                    {
                        "typeName": "dsDescription", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "dsDescriptionValue": {
                                    "typeName": "dsDescriptionValue", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "<p>The North Carolina State Center for Health Services (SCHS) collects yearly vital statistics. The Odum Institute holds vital statistics beginning in 1968 for births, fetal deaths, deaths, birth/infant deaths, marriages and divorce. Public marriage and divorce data are available through 1999 only.</p><p>We have created a consolidated birth/infant death file that contains records of deaths occurring during the first year of life. Each such death record has been matched with a corresponding birth\nrecord creating a composite record containing information about both events. Users of these consolidated files should be aware that the file year of these data sets refers to the year of birth, not the year of death. For example, the 1970 consolidated birth/infant death file contains records of births occurring during 1970 that ended in an infant death either in 1970 or 1971. For this reason, the number of infant deaths for a particular year as obtained from the consolidated file will not be the same as the number obtained\nfrom the death file for that same year. This difference should especially be kept in mind when using this file in conjunction with the publication Vital Statistics, volume 1. This study focuses on North Carolina birth/infant deaths for 1976. It includes data on the age, education level and marital status of the parents; sex and race of the child; prenatal medical care received; county and hospital of birth; information on the mother's reproductive history including number of previous pregnancies and live births; as well as statistics on the newborn and autopsy information.\n</p> <p>The data is strictly numerical, there is no identifying information given about the parents or child.</p>"
                                }
                            }
                        ]
                    }, 
                    {
                        "typeName": "keyword", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "keywordValue": {
                                    "typeName": "keywordValue", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "Births"
                                }, 
                                "keywordVocabulary": {
                                    "typeName": "keywordVocabulary", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "ODUM:INDEX.TERMS"
                                }
                            }, 
                            {
                                "keywordValue": {
                                    "typeName": "keywordValue", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "Infant death"
                                }, 
                                "keywordVocabulary": {
                                    "typeName": "keywordVocabulary", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "ODUM:INDEX.TERMS"
                                }
                            }
                        ]
                    }, 
                    {
                        "typeName": "notesText", 
                        "multiple": false, 
                        "typeClass": "primitive", 
                        "value": "Version Date: 1976Version Text: Birth/Infant Death"
                    }, 
                    {
                        "typeName": "producer", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "producerName": {
                                    "typeName": "producerName", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "State Center for Health Statistics"
                                }, 
                                "producerAbbreviation": {
                                    "typeName": "producerAbbreviation", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "SCHS"
                                }, 
                                "producerURL": {
                                    "typeName": "producerURL", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "http://www.schs.state.nc.us/SCHS/"
                                }, 
                                "producerLogoURL": {
                                    "typeName": "producerLogoURL", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "http://www.schs.state.nc.us/SCHS/images/schslogo2.gif"
                                }
                            }
                        ]
                    }, 
                    {
                        "typeName": "productionDate", 
                        "multiple": false, 
                        "typeClass": "primitive", 
                        "value": "1977"
                    }, 
                    {
                        "typeName": "distributor", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "distributorName": {
                                    "typeName": "distributorName", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "Odum Institute for Research in Social Science"
                                }
                            }
                        ]
                    }, 
                    {
                        "typeName": "timePeriodCovered", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "timePeriodCoveredStart": {
                                    "typeName": "timePeriodCoveredStart", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "1976-01-01"
                                }, 
                                "timePeriodCoveredEnd": {
                                    "typeName": "timePeriodCoveredEnd", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "1976-12-31"
                                }
                            }
                        ]
                    }, 
                    {
                        "typeName": "kindOfData", 
                        "multiple": true, 
                        "typeClass": "primitive", 
                        "value": [
                            "Numeric"
                        ]
                    }, 
                    {
                        "typeName": "series", 
                        "multiple": false, 
                        "typeClass": "compound", 
                        "value": {
                            "seriesName": {
                                "typeName": "seriesName", 
                                "multiple": false, 
                                "typeClass": "primitive", 
                                "value": "North Carolina Vital Statistics"
                            }
                        }
                    }
                ]
            }, 
            "geospatial": {
                "displayName": "Geospatial Metadata", 
                "fields": [
                    {
                        "typeName": "geographicCoverage", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "country": {
                                    "typeName": "country", 
                                    "multiple": false, 
                                    "typeClass": "controlledVocabulary", 
                                    "value": "United States"
                                }
                            }, 
                            {
                                "otherGeographicCoverage": {
                                    "typeName": "otherGeographicCoverage", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "North Carolina"
                                }
                            }
                        ]
                    }
                ]
            }
        }

Clean version (3,446 bytes)

{
    "citation": {
        "title": "North Carolina Vital Statistics -- Birth/Infant Deaths 1976", 
        "author": [
            {
                "authorName": "State Center for Health Statistics"
            }
        ], 
        "datasetContact": [
            {
                "datasetContactName": "David Sheaves", 
                "datasetContactEmail": "david_sheaves@unc.edu"
            }
        ], 
        "dsDescription": [
            {
                "dsDescriptionValue": "<p>The North Carolina State Center for Health Services (SCHS) collects yearly vital statistics. The Odum Institute holds vital statistics beginning in 1968 for births, fetal deaths, deaths, birth/infant deaths, marriages and divorce. Public marriage and divorce data are available through 1999 only.</p><p>We have created a consolidated birth/infant death file that contains records of deaths occurring during the first year of life. Each such death record has been matched with a corresponding birth\nrecord creating a composite record containing information about both events. Users of these consolidated files should be aware that the file year of these data sets refers to the year of birth, not the year of death. For example, the 1970 consolidated birth/infant death file contains records of births occurring during 1970 that ended in an infant death either in 1970 or 1971. For this reason, the number of infant deaths for a particular year as obtained from the consolidated file will not be the same as the number obtained\nfrom the death file for that same year. This difference should especially be kept in mind when using this file in conjunction with the publication Vital Statistics, volume 1. This study focuses on North Carolina birth/infant deaths for 1976. It includes data on the age, education level and marital status of the parents; sex and race of the child; prenatal medical care received; county and hospital of birth; information on the mother's reproductive history including number of previous pregnancies and live births; as well as statistics on the newborn and autopsy information.\n</p> <p>The data is strictly numerical, there is no identifying information given about the parents or child.</p>"
            }
        ], 
        "keyword": [
            {
                "keywordValue": "Births", 
                "keywordVocabulary": "ODUM:INDEX.TERMS"
            }, 
            {
                "keywordValue": "Infant death", 
                "keywordVocabulary": "ODUM:INDEX.TERMS"
            }
        ], 
        "notesText": "Version Date: 1976Version Text: Birth/Infant Death", 
        "producer": [
            {
                "producerName": "State Center for Health Statistics", 
                "producerAbbreviation": "SCHS", 
                "producerURL": "http://www.schs.state.nc.us/SCHS/", 
                "producerLogoURL": "http://www.schs.state.nc.us/SCHS/images/schslogo2.gif"
            }
        ], 
        "productionDate": "1977", 
        "distributor": [
            {
                "distributorName": "Odum Institute for Research in Social Science"
            }
        ], 
        "timePeriodCovered": [
            {
                "timePeriodCoveredStart": "1976-01-01", 
                "timePeriodCoveredEnd": "1976-12-31"
            }
        ], 
        "kindOfData": [
            "Numeric"
        ]
    }, 
    "geospatial": {
        "geographicCoverage": [
            {
                "country": "United States"
            }, 
            {
                "otherGeographicCoverage": "North Carolina"
            }
        ]
    }
}
bencomp commented 8 years ago

To me this looks related to #2357, as separating the field definitions from the API output would allow the API output to contain only values.

pdurbin commented 8 years ago

Making sure the API is complete.

@evelynPM pointed out that license and terms of access information not appearing in json in #2794

pdurbin commented 6 years ago

I just mentioned this issue at https://groups.google.com/d/msg/dataverse-community/4XsA0Px2H8Q/CgO9OmkMAgAJ and now I realize that it seems to be about JSON output rather than input. Presumably we're want to support the same format in and out.

3599 is related, having to do with simple edits.

3859 is also related because people struggle so much with the current complex JSON need to create a dataset with rich metadata. That issue is about at least providing a full example in the API Guide.

raprasad commented 6 years ago

@pdurbin : tangentially related

A simplified JSON output is available through miniverse and is fairly fast. The original goal of that experiment was to also have it go back into input. ( queries are minimized and results are cached)

Example

The following API endpoints give JSON for the dataset here:

If you have a DOI or dataset id, the JSON should be available for any published dataset.

JSON for that dataset

swagger info

pdurbin commented 6 years ago

@raprasad cool. Did you ever figure out how to validate your JSON format with JSON Schema or similar in either Python or Java?

raprasad commented 6 years ago

I never did it for dataset JSON format in particular but there are many tools around to do it: http://json-schema.org/implementations.html#validators

The repo for converting Dataverse TSV metadata into JSON schemas with validation is here:

The Jeremy Dorn links at the top are useful for getting started in creating a schema or your choice.

pdurbin commented 6 years ago

@raprasad thanks, I just opened this issue: https://github.com/IQSS/json-schema-test/issues/1

pdurbin commented 6 years ago

Great idea. This issue doesn't have a champion. Closing.

scolapasta commented 6 years ago

This something that we still want and in my opinion, I'd like us to find a wa y to prioritize - it would make our APIs significantly more useful and we have the related new work in #3083 for importing datasets.

raprasad commented 5 years ago

Note: a cleaned up version of the sample code at the top of this ticket* could be dropped into pyDataverse as an option. Ticket added here: https://github.com/AUSSDA/pyDataverse/issues/23

This could be especially helpful for programmatic data discovery, Jupyter notebook users, etc.

pdurbin commented 1 year ago

Now that we have the semantic API, should we close this?

https://guides.dataverse.org/en/5.11.1/developers/dataset-semantic-metadata-api.html