GetDKAN / dkan

DKAN Open Data Portal
https://dkan.readthedocs.io/en/latest/index.html
GNU General Public License v2.0
373 stars 171 forks source link

Schemas for data resources (column-level metadata) #3169

Open dafeder opened 4 years ago

dafeder commented 4 years ago

There were a lot of false starts for this in DKAN 1, but we want to get this right in DKAN2. Schemas for Datasets provide value in several areas:

  1. Describe to end users what kind of data to expect in each column
  2. Validate column headers and order for incoming datasets
  3. Improve datastore tables by mapping schema data types to mysql column types
  4. Allow for 3rd-party row-level data validation and data quality analysis, via tools such as Good Tables.

Implementation details

Based on conversations so far, a rough spec might look like:

  1. Schemas are a child metadata object to a dataset, just like distributions.
  2. The default schema for for these metadata objects will be the Frictionless Table Schema. (Uncertain; see question below)
  3. Develop a front-end component to display a schema on a dataset page
  4. Develop a UI for schema creation, possibly using components of the data package creator app
  5. We would need to add schemas to our API design in an intuitive way

Questions

  1. Do we need Table Schema? Can we just have people define their data schema/dictionary with JSON Schema given extreemly broad constraints (the schema defines a simple array of objects)?
dafeder commented 4 years ago

Some examples of the data dictionary web display from other data products:

image CKAN

image Socrata

janette commented 4 years ago

@dafeder to draft implementation ticket and share

kimwdavidson commented 4 years ago

Revisit after we complete switch to custom entities.