Coleridge-Initiative / RCDatasets

Creative Commons Zero v1.0 Universal
3 stars 2 forks source link

dataset metadata #100

Closed srand525 closed 4 years ago

srand525 commented 4 years ago

We have two types of dataset metadata that we'd like to start populating into datasets.json

We can start to account for these in an original subdict, as we've done with RCPublications, like the below

      "id": "dataset-428",
      "provider": "National Science Foundation",
      "title":  "Higher Education Research and Development Survey",
      "alt_title": [
          "HERD",
          "Federal RePORTER"
      ],
      "url": "https://www.nsf.gov/statistics/srvyherd/",
      "description": "The survey collects information on R&D expenditures by field of research and source of funds and also gathers information on types of research, expenses, and headcounts of R&D personnel.",
      "original":{
          "joins_to":["dataset-493"]
      }
    }

But we should similarly decide what is the canonical set of metadata fields that we want to start incorporating into the subdict, and perhaps decide on standard field names here. @ceteri, do we want to include these joins_to or related_to fields, at this stage? This was prompted in part by our obtaining these documents from data providers, see below: https://github.com/NYU-CI/RCCustomers/blob/master/customers/NCSES/NCSES%20Database%20Diagram_With_Coleridge.pdf

ceteri commented 4 years ago

Good points, although the kinds of relations described here are directly part of KG representation. That comes later in the workflows.