DOI-DO / dcat-us

Data Catalog Vocabulary (DCAT) - United States Profile Chief Data Officers Council & Federal Committee on Statistical Methodology
Other
58 stars 7 forks source link

Collect both quantitative and qualitative information to assess fitness for use #12

Closed mahunter14 closed 10 months ago

mahunter14 commented 1 year ago

Creator Name: Marc Hunter Creator Affiliation: USGS

Requirement(s)

Data providers must present the measurable errors and limitations of a dataset along with a brief narrative regarding their intended application and potential shortcomings. An enumerated domain of general usability terms would broadly standardize the description.

Problem Statement

The ultimate goal of FAIR data is to promote reuse of data, and this can only be accomplished if a downstream consumer can assess the fitness of use for that dataset and trust it's provenance. Quantitative and qualitative assessments of all appropriate applications are impossible, and the information has historically been captured in many different areas, when described at all.

Target Audience / Stakeholders

User 1. Downstream consumers of data outside of the scientific discipline of the provider. User 2. Data providers who have not had consistent guidance on how to measure and describe scientific data beyond it's use in a given investigation. User 3. Metadata managers and systems developers cross-walking multiple fields from existing catalogs.

Intended Uses / Use Cases

Present both quantitative and qualitative measures of a dataset to provide the most comprehensive assessment that are both human and machine readable.

Existing Approaches - Optional

Aspects of fitness for use have historically been captured in many different fields (in FGDC) .

Additional context, comments, or links - Optional

Analysis Ready Data of Mars is a current example of how this information can be presented to a new user.

fellahst commented 10 months ago

Marc,

In response to the requirement for data providers to present measurable errors and limitations of a dataset, along with a narrative on its intended application and potential shortcomings, the DCAT-US framework offers usage guidelines for capturing data quality. The guidelines, as outlined in the Data Quality section of the DCAT-US documentation, provide a comprehensive approach to standardize the description of data quality.

Solution Overview:

The DCAT-US framework, leveraging the Data Quality Vocabulary (DQV), enables data providers to document both quantitative and qualitative aspects of data quality. It allows for the representation of measurable errors and limitations, aiding downstream consumers in assessing the fitness for use. The guidelines include how to utilize specific properties like dqv:QualityMeasurement, dqv:Metric, and dqv:Dimension to detail the quality of a dataset comprehensively. This structured representation facilitates a clear understanding of the dataset's applicability, potential shortcomings, and overall reliability.