DOI-DO / dcat-us

Data Catalog Vocabulary (DCAT) - United States Profile Chief Data Officers Council & Federal Committee on Statistical Methodology
Other
58 stars 6 forks source link

Provide business context (provenance) for datasets to ensure authenticity, integrity, and appropriate interpretation of the data #117

Closed johnd-im closed 7 months ago

johnd-im commented 1 year ago

Creator Name: John Davidson Creator Affiliation: Contractor, Department of the Interior/OCIO/CDO

Requirement(s)

  1. Specify provenance of the dataset as links or citations to entities, agents, or activities (e.g., systems, services, applications, organizations, persons, and/or activities (including algorithms) that “generated” (collected, processed, analyzed, synthesized, transformed, or otherwise packaged) the dataset.
  2. Provide additional context with human-readable statement "of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation."

Problem Statement

Cannot associate a dataset resource with the business context (provenance) of a dataset, specifically the linkages to the sources of the data, how the data was collected and processed, and how the dataset was produced. Having this contextual information directly supports FAIR principles for findability and reuseability.

Target Audience / Stakeholders

User1: Data producer/ owner/ steward User2: Data consumer/ SME/ analyst

Intended Uses / Use Cases

UseCase1: As a data steward, I need to document who owned or had custody of the data and how the data were collected and processed to establish trust in the data and to maximize the "downstream" value of the data for analysis and decision-making. UseCase2: As a data SME, I want to know how the data was collected and processed in order to know if its use is appropriate for my needs. UseCase3: As a data consumer, I want to be reassured that I can trust the data for authenticity and integrity by knowing how it by knowing which systems and algorithms were used to process it, by whom (individuals or organizations).

Existing Approaches - Optional

  1. GeoDCAT-AP 2.0.0: https://semiceu.github.io/GeoDCAT-AP/releases/2.0.0/#properties-for-provenance-statement
  2. DCAT-AP: https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/provenance
  3. DCAT3: https://www.w3.org/TR/vocab-dcat-3/#examples-dataset-provenance

Additional context, comments, or links - Optional

A requirement of the DOI's "Application Profile of DCAT-US 1.1" metadata specification.

fellahst commented 8 months ago

This requirement is addressed with the use of PROV-O ontology in the DCAT-US profile. See the section provenance metadata usage guideline in the specification.