Closed raprasad closed 1 year ago
Example of metadata blocks for this dataset: http://hdl.handle.net/1902.29/10220
"metadataBlocks": {
"citation": {
"displayName": "Citation Metadata",
"fields": [
{
"typeName": "title",
"multiple": false,
"typeClass": "primitive",
"value": "North Carolina Vital Statistics -- Birth/Infant Deaths 1976"
},
{
"typeName": "author",
"multiple": true,
"typeClass": "compound",
"value": [
{
"authorName": {
"typeName": "authorName",
"multiple": false,
"typeClass": "primitive",
"value": "State Center for Health Statistics"
}
}
]
},
{
"typeName": "datasetContact",
"multiple": true,
"typeClass": "compound",
"value": [
{
"datasetContactName": {
"typeName": "datasetContactName",
"multiple": false,
"typeClass": "primitive",
"value": "David Sheaves"
},
"datasetContactEmail": {
"typeName": "datasetContactEmail",
"multiple": false,
"typeClass": "primitive",
"value": "david_sheaves@unc.edu"
}
}
]
},
{
"typeName": "dsDescription",
"multiple": true,
"typeClass": "compound",
"value": [
{
"dsDescriptionValue": {
"typeName": "dsDescriptionValue",
"multiple": false,
"typeClass": "primitive",
"value": "<p>The North Carolina State Center for Health Services (SCHS) collects yearly vital statistics. The Odum Institute holds vital statistics beginning in 1968 for births, fetal deaths, deaths, birth/infant deaths, marriages and divorce. Public marriage and divorce data are available through 1999 only.</p><p>We have created a consolidated birth/infant death file that contains records of deaths occurring during the first year of life. Each such death record has been matched with a corresponding birth\nrecord creating a composite record containing information about both events. Users of these consolidated files should be aware that the file year of these data sets refers to the year of birth, not the year of death. For example, the 1970 consolidated birth/infant death file contains records of births occurring during 1970 that ended in an infant death either in 1970 or 1971. For this reason, the number of infant deaths for a particular year as obtained from the consolidated file will not be the same as the number obtained\nfrom the death file for that same year. This difference should especially be kept in mind when using this file in conjunction with the publication Vital Statistics, volume 1. This study focuses on North Carolina birth/infant deaths for 1976. It includes data on the age, education level and marital status of the parents; sex and race of the child; prenatal medical care received; county and hospital of birth; information on the mother's reproductive history including number of previous pregnancies and live births; as well as statistics on the newborn and autopsy information.\n</p> <p>The data is strictly numerical, there is no identifying information given about the parents or child.</p>"
}
}
]
},
{
"typeName": "keyword",
"multiple": true,
"typeClass": "compound",
"value": [
{
"keywordValue": {
"typeName": "keywordValue",
"multiple": false,
"typeClass": "primitive",
"value": "Births"
},
"keywordVocabulary": {
"typeName": "keywordVocabulary",
"multiple": false,
"typeClass": "primitive",
"value": "ODUM:INDEX.TERMS"
}
},
{
"keywordValue": {
"typeName": "keywordValue",
"multiple": false,
"typeClass": "primitive",
"value": "Infant death"
},
"keywordVocabulary": {
"typeName": "keywordVocabulary",
"multiple": false,
"typeClass": "primitive",
"value": "ODUM:INDEX.TERMS"
}
}
]
},
{
"typeName": "notesText",
"multiple": false,
"typeClass": "primitive",
"value": "Version Date: 1976Version Text: Birth/Infant Death"
},
{
"typeName": "producer",
"multiple": true,
"typeClass": "compound",
"value": [
{
"producerName": {
"typeName": "producerName",
"multiple": false,
"typeClass": "primitive",
"value": "State Center for Health Statistics"
},
"producerAbbreviation": {
"typeName": "producerAbbreviation",
"multiple": false,
"typeClass": "primitive",
"value": "SCHS"
},
"producerURL": {
"typeName": "producerURL",
"multiple": false,
"typeClass": "primitive",
"value": "http://www.schs.state.nc.us/SCHS/"
},
"producerLogoURL": {
"typeName": "producerLogoURL",
"multiple": false,
"typeClass": "primitive",
"value": "http://www.schs.state.nc.us/SCHS/images/schslogo2.gif"
}
}
]
},
{
"typeName": "productionDate",
"multiple": false,
"typeClass": "primitive",
"value": "1977"
},
{
"typeName": "distributor",
"multiple": true,
"typeClass": "compound",
"value": [
{
"distributorName": {
"typeName": "distributorName",
"multiple": false,
"typeClass": "primitive",
"value": "Odum Institute for Research in Social Science"
}
}
]
},
{
"typeName": "timePeriodCovered",
"multiple": true,
"typeClass": "compound",
"value": [
{
"timePeriodCoveredStart": {
"typeName": "timePeriodCoveredStart",
"multiple": false,
"typeClass": "primitive",
"value": "1976-01-01"
},
"timePeriodCoveredEnd": {
"typeName": "timePeriodCoveredEnd",
"multiple": false,
"typeClass": "primitive",
"value": "1976-12-31"
}
}
]
},
{
"typeName": "kindOfData",
"multiple": true,
"typeClass": "primitive",
"value": [
"Numeric"
]
},
{
"typeName": "series",
"multiple": false,
"typeClass": "compound",
"value": {
"seriesName": {
"typeName": "seriesName",
"multiple": false,
"typeClass": "primitive",
"value": "North Carolina Vital Statistics"
}
}
}
]
},
"geospatial": {
"displayName": "Geospatial Metadata",
"fields": [
{
"typeName": "geographicCoverage",
"multiple": true,
"typeClass": "compound",
"value": [
{
"country": {
"typeName": "country",
"multiple": false,
"typeClass": "controlledVocabulary",
"value": "United States"
}
},
{
"otherGeographicCoverage": {
"typeName": "otherGeographicCoverage",
"multiple": false,
"typeClass": "primitive",
"value": "North Carolina"
}
}
]
}
]
}
}
{
"citation": {
"title": "North Carolina Vital Statistics -- Birth/Infant Deaths 1976",
"author": [
{
"authorName": "State Center for Health Statistics"
}
],
"datasetContact": [
{
"datasetContactName": "David Sheaves",
"datasetContactEmail": "david_sheaves@unc.edu"
}
],
"dsDescription": [
{
"dsDescriptionValue": "<p>The North Carolina State Center for Health Services (SCHS) collects yearly vital statistics. The Odum Institute holds vital statistics beginning in 1968 for births, fetal deaths, deaths, birth/infant deaths, marriages and divorce. Public marriage and divorce data are available through 1999 only.</p><p>We have created a consolidated birth/infant death file that contains records of deaths occurring during the first year of life. Each such death record has been matched with a corresponding birth\nrecord creating a composite record containing information about both events. Users of these consolidated files should be aware that the file year of these data sets refers to the year of birth, not the year of death. For example, the 1970 consolidated birth/infant death file contains records of births occurring during 1970 that ended in an infant death either in 1970 or 1971. For this reason, the number of infant deaths for a particular year as obtained from the consolidated file will not be the same as the number obtained\nfrom the death file for that same year. This difference should especially be kept in mind when using this file in conjunction with the publication Vital Statistics, volume 1. This study focuses on North Carolina birth/infant deaths for 1976. It includes data on the age, education level and marital status of the parents; sex and race of the child; prenatal medical care received; county and hospital of birth; information on the mother's reproductive history including number of previous pregnancies and live births; as well as statistics on the newborn and autopsy information.\n</p> <p>The data is strictly numerical, there is no identifying information given about the parents or child.</p>"
}
],
"keyword": [
{
"keywordValue": "Births",
"keywordVocabulary": "ODUM:INDEX.TERMS"
},
{
"keywordValue": "Infant death",
"keywordVocabulary": "ODUM:INDEX.TERMS"
}
],
"notesText": "Version Date: 1976Version Text: Birth/Infant Death",
"producer": [
{
"producerName": "State Center for Health Statistics",
"producerAbbreviation": "SCHS",
"producerURL": "http://www.schs.state.nc.us/SCHS/",
"producerLogoURL": "http://www.schs.state.nc.us/SCHS/images/schslogo2.gif"
}
],
"productionDate": "1977",
"distributor": [
{
"distributorName": "Odum Institute for Research in Social Science"
}
],
"timePeriodCovered": [
{
"timePeriodCoveredStart": "1976-01-01",
"timePeriodCoveredEnd": "1976-12-31"
}
],
"kindOfData": [
"Numeric"
]
},
"geospatial": {
"geographicCoverage": [
{
"country": "United States"
},
{
"otherGeographicCoverage": "North Carolina"
}
]
}
}
To me this looks related to #2357, as separating the field definitions from the API output would allow the API output to contain only values.
Making sure the API is complete.
@evelynPM pointed out that license and terms of access information not appearing in json in #2794
I just mentioned this issue at https://groups.google.com/d/msg/dataverse-community/4XsA0Px2H8Q/CgO9OmkMAgAJ and now I realize that it seems to be about JSON output rather than input. Presumably we're want to support the same format in and out.
@pdurbin : tangentially related
A simplified JSON output is available through miniverse and is fairly fast. The original goal of that experiment was to also have it go back into input. ( queries are minimized and results are cached)
The following API endpoints give JSON for the dataset here:
If you have a DOI or dataset id, the JSON should be available for any published dataset.
swagger info on those endpoints is here:
(Caveat: code 2+ years old so somewhat incomplete)
@raprasad cool. Did you ever figure out how to validate your JSON format with JSON Schema or similar in either Python or Java?
I never did it for dataset JSON format in particular but there are many tools around to do it: http://json-schema.org/implementations.html#validators
The repo for converting Dataverse TSV metadata into JSON schemas with validation is here:
The Jeremy Dorn links at the top are useful for getting started in creating a schema or your choice.
@raprasad thanks, I just opened this issue: https://github.com/IQSS/json-schema-test/issues/1
Great idea. This issue doesn't have a champion. Closing.
This something that we still want and in my opinion, I'd like us to find a wa y to prioritize - it would make our APIs significantly more useful and we have the related new work in #3083 for importing datasets.
Note: a cleaned up version of the sample code at the top of this ticket* could be dropped into pyDataverse as an option. Ticket added here: https://github.com/AUSSDA/pyDataverse/issues/23
This could be especially helpful for programmatic data discovery, Jupyter notebook users, etc.
Now that we have the semantic API, should we close this?
https://guides.dataverse.org/en/5.11.1/developers/dataset-semantic-metadata-api.html
There's a new issue about this:
This is probably worth a separate FRD and includes (certainly not limited to) items such as:
This is not a "massive" project but modifications on the existing code.