DOI-DO / dcat-us

Data Catalog Vocabulary (DCAT) - United States Profile Chief Data Officers Council & Federal Committee on Statistical Methodology
Other
58 stars 6 forks source link

Incorporate elements of the Federal Committee on Statistical Methodology's Data Quality Framework #100

Closed mrratcliffe closed 9 months ago

mrratcliffe commented 1 year ago

Michael Ratcliffe: US Census Bureau:

Requirement(s)

Incorporate Domains and Dimensions from the Federal Committee on Statistical Methodology's Data Quality Framework into the DCAT metadata standard. Provide fields for Data Quality Framework dimensions that can be filled by dataset producers, as appropriate.

Domains and component dimensions are:

  1. Utility

    • Relevance: whether the data product is targeted to meet current or prospective user needs
    • Accessibility: how to obtain access to the dataset and dataset documentation
    • Timeliness: reference date of data; date of data collection; date of dataset publication
    • Granularity: Amount of disaggregation available for key data elements: 1) units of time, 2) level of geographic detail, and/or 3) amount of detail available on any number of demographic or economic characteristics
  2. Objectivity

    • Accuracy and Reliability: measures of statistical accuracy, such as standard deviation, coefficient of variation, etc.; measures of consistency of results
    • Coherence: ability of the data product to maintain common definitions, classifications, methodological processes, comparability with other relevant data
  3. Integrity

    • Scientific Integrity: adherence to scientific standards, use of established methods
    • Confidentiality: information related to disclosure avoidance methods applied to dataset

Problem Statement

Mandatory statement of the current situation, including: From the FCSM Data Quality Framework: "Effective understanding of data quality is essential for public ofcials, private businesses, and the public to make data-driven decisions. Data users who understand the ftness-for-use of data are more likely to use them appropriately, whether for secondary use in developing other data products, for conducting data analysis, or when using data outputs for decision making. Te Interagency Council on Statistical Policy (ICSP) has indicated that “agencies should work to adopt a common language and framework for reporting on the quality of data sets and derivative information they disseminate.”

In line with OMB and ICSP guidance, dataset producers/stakeholders are expected to provide information that reports on the various aspects of quality as detailed by the Data Quality Framework, as appropriate, to enable data consumers to adquately assess fitness for use, accessibility of the dataset and how to access.

References: Federal Committee on Statistical Methodology (FCSM), 2020. "A Framework for Data Quality," FCSM-20-04 available at https://www.fcsm.gov/assets/files/docs/FCSM.20.04_A_Framework_for_Data_Quality.pdf

ICSP. 2018. Principles for Modernizing Production of Federal Statistics, Available at https://nces.ed.gov/fcsm/pdf/Principles.pdf.

OMB. 2019. M-19-15: Improving Implementation of the Information Quality Act. April 2019. Available at: https://www.whitehouse.gov/wp-content/uploads/2019/04/M-19-15.pdf

Target Audience / Stakeholders

User 1. Dataset producers User 2. Dataset consumers User 3. Data Consumers/Producers in the Statistical Domain*

Intended Uses / Use Cases

Use 1: Dataset producers providing information related to data quality framework domains in compliance with OMB, ICSP, and FCSM guidelines Use 2: Data consumers assessing fitness for use Use 3: Search engines reading metadata in response to user search

fellahst commented 9 months ago

Recommendation: To address the requirement of incorporating the Data Quality Framework dimensions into the DCAT metadata standard, it is recommended to utilize dqv:hasQualityMeasurement and dqv:QualityMeasurement from the Data Quality Vocabulary (DQV). This approach involves:

  1. Utilizing dqv:hasQualityMeasurement:

    • This property links a dataset or data distribution to a quality measurement.
    • It enables the representation of various quality dimensions as specified in the Data Quality Framework.
  2. Implementing dqv:QualityMeasurement:

    • This class is used to describe specific quality measurements.
    • For each domain and dimension of the Data Quality Framework (Utility, Objectivity, Integrity), corresponding instances of dqv:QualityMeasurement can be created.
  3. Employing Controlled Vocabularies:

    • Define controlled vocabularies for metrics and dimensions as outlined in the Data Quality Framework.
    • These vocabularies will standardize the representation of quality measurements, ensuring consistency and clarity in the metadata.
  4. Examples of Usage:

    • For "Relevance" under the Utility domain, create a dqv:QualityMeasurement instance with a specific controlled vocabulary term that quantifies relevance.
    • Similarly, for "Accuracy and Reliability" under Objectivity, another dqv:QualityMeasurement instance can be created with appropriate metrics like standard deviation or coefficient of variation.

This approach aligns with the guidelines from OMB, ICSP, and FCSM, facilitating dataset producers in providing comprehensive quality-related information and enabling dataset consumers to adequately assess the fitness for use of the data.

TDabolt commented 9 months ago

Approach meets the requirement. @mrratcliffe would be good to walk this and the documentation through with the statistical community to make sure the documentation is clear from the stats community perspective. Would like to hear the communities feedback before closing this issue.

fellahst commented 9 months ago

I have completed the section for Data Quality usage guideline.

mrratcliffe commented 9 months ago

@TDabolt I'll share this with FCSM colleagues.

mrratcliffe commented 9 months ago

@fellahst I received the following comment and question from Jennifer Parker (Centers for Disease Control):

This [the recommended solution] looks good. I have one comment on Accuracy. Is there a way to incorporate statistical bias from, say, coverage or nonprobability data etc.? I realize that indicating this is difficult, more like relevance than variance measures. As we bring in more types of data, the bias issues can be bigger than the variance.

fellahst commented 9 months ago

Should not be a problem as long you create a Metric for it that has a numeric value.