IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
882 stars 494 forks source link

Feature Request: Simplify dataset metadata JSON files for dataset creation or import #10957

Open DS-INRAE opened 1 month ago

DS-INRAE commented 1 month ago

Overview of the Feature Request Remove elements from the dataset creation json file that are superfluous

What kind of user is the feature intended for? API User

What inspired the request? JSON files are long, complex and intimidating for new users.

What existing behavior do you want changed? Remove the need of the following attributes in the dataset JSON files :

JSON files comparison Current Darwin Finches JSON for the fields title, author, datasetContact, dsDescription, subject :

{
  "datasetVersion": {
    "license": {
      "name": "CC0 1.0",
      "uri": "http://creativecommons.org/publicdomain/zero/1.0"
    },
    "metadataBlocks": {
      "citation": {
        "fields": [
          {
            "value": "Darwin's Finches",
            "typeClass": "primitive",
            "multiple": false,
            "typeName": "title"
          },
          {
            "value": [
              {
                "authorName": {
                  "value": "Finch, Fiona",
                  "typeClass": "primitive",
                  "multiple": false,
                  "typeName": "authorName"
                },
                "authorAffiliation": {
                  "value": "Birds Inc.",
                  "typeClass": "primitive",
                  "multiple": false,
                  "typeName": "authorAffiliation"
                }
              }
            ],
            "typeClass": "compound",
            "multiple": true,
            "typeName": "author"
          },
          {
            "value": [ 
                { "datasetContactEmail" : {
                    "typeClass": "primitive",
                    "multiple": false,
                    "typeName": "datasetContactEmail",
                    "value" : "finch@mailinator.com"
                },
                "datasetContactName" : {
                    "typeClass": "primitive",
                    "multiple": false,
                    "typeName": "datasetContactName",
                    "value": "Finch, Fiona"
                }
            }],
            "typeClass": "compound",
            "multiple": true,
            "typeName": "datasetContact"
          },
          {
            "value": [ {
               "dsDescriptionValue":{
                "value":   "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds.",
                "multiple":false,
               "typeClass": "primitive",
               "typeName": "dsDescriptionValue"
            }}],
            "typeClass": "compound",
            "multiple": true,
            "typeName": "dsDescription"
          },
          {
            "value": [
              "Medicine, Health and Life Sciences"
            ],
            "typeClass": "controlledVocabulary",
            "multiple": true,
            "typeName": "subject"
          }
        ],
        "displayName": "Citation Metadata"
      }
    }
  }
}

Simplified JSON file :

{
  "datasetVersion": {
    "license": {
      "name": "CC0 1.0",
      "uri": "http://creativecommons.org/publicdomain/zero/1.0"
    },
    "metadataBlocks": {
      "citation": {
        "fields": [
          {
            "value": "Darwin's Finches",
            "typeName": "title"
          },
          {
            "value": [
              {
                "authorName": {
                  "value": "Finch, Fiona",
                  "typeName": "authorName"
                },
                "authorAffiliation": {
                  "value": "Birds Inc.",
                  "typeName": "authorAffiliation"
                }
              }
            ],
            "typeName": "author"
          },
          {
            "value": [ 
                { "datasetContactEmail" : {
                    "typeName": "datasetContactEmail",
                    "value" : "finch@mailinator.com"
                },
                "datasetContactName" : {
                    "typeName": "datasetContactName",
                    "value": "Finch, Fiona"
                }
            }],
            "typeName": "datasetContact"
          },
          {
            "value": [ {
               "dsDescriptionValue":{
                "value":   "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds.",
               "typeName": "dsDescriptionValue"
            }}],
            "typeName": "dsDescription"
          },
          {
            "value": [
              "Medicine, Health and Life Sciences"
            ],
            "typeName": "subject"
          }
        ]
      }
    }
  }
}

Are you thinking about creating a pull request for this feature?
Even if this would help increase APIs adoption, we have other priorities at the moment.

DS-INRAE commented 1 month ago

Note: a more radical simplification would be very interesting, but hopefully this would be an easier quick win.

qqmyers commented 1 month ago

Note that the metadata input for the semantic API would look like (using a (~standard) @context for readability):

{
  "title":"Darwin's Finches",
  "author": {
    "citation:authorName": "Finch, Fiona",
    "citation:authorAffiliation": "Bird's Inc."
  },   
  "citation:datasetContact": {
    "citation:datasetContactName": "Finch, Fiona",
    "citation:datasetContactEmail": "finch@mailinator.com"
  },
  "citation:dsDescription": {
    "citation:dsDescriptionValue": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds."
  },
  "subject": "Medicine, Health and Life Sciences",
  "@context": {
    "author": "http://purl.org/dc/terms/creator",
    "citation": "https://dataverse.org/schema/citation/",
    "subject": "http://purl.org/dc/terms/subject",
    "termName": "https://schema.org/name",
    "title": "http://purl.org/dc/terms/title"
  }
}

or, even shorter,

{
  "http://purl.org/dc/terms/title":"Darwin's Finches",
  "http://purl.org/dc/terms/creator": {
    "https://dataverse.org/schema/citation/authorName": "Finch, Fiona",
    "https://dataverse.org/schema/citation/authorAffiliation": "Bird's Inc."
  },   
  "https://dataverse.org/schema/citation/datasetContact": {
    "https://dataverse.org/schema/citation/datasetContactName": "Finch, Fiona",
    "https://dataverse.org/schema/citation/datasetContactEmail": "finch@mailinator.com"
  },
  "https://dataverse.org/schema/citation/dsDescription": {
    "https://dataverse.org/schema/citation/dsDescriptionValue": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds."
  },
  "http://purl.org/dc/terms/subject": "Medicine, Health and Life Sciences",
}
pdurbin commented 1 month ago

This is what I've suggested to @JR-1991 who has slides ready about the gnarly complicated native format, to try the semantic API. 😄

See also discussion here:

JR-1991 commented 1 month ago

@pdurbin, it is on my bucket list 😁 Can this also be passed to the dataset creation/edit endpoint?

pdurbin commented 1 month ago

@JR-1991 well, you have to pass 'Content-Type: application/ld+json'. Please see the guides: https://guides.dataverse.org/en/6.4/developers/dataset-semantic-metadata-api.html