Closed mrratcliffe closed 9 months ago
Recommendation: To address the requirement of incorporating the Data Quality Framework dimensions into the DCAT metadata standard, it is recommended to utilize dqv:hasQualityMeasurement and dqv:QualityMeasurement from the Data Quality Vocabulary (DQV). This approach involves:
Utilizing dqv:hasQualityMeasurement
:
Implementing dqv:QualityMeasurement
:
dqv:QualityMeasurement
can be created.Employing Controlled Vocabularies:
Examples of Usage:
dqv:QualityMeasurement
instance with a specific controlled vocabulary term that quantifies relevance.dqv:QualityMeasurement
instance can be created with appropriate metrics like standard deviation or coefficient of variation.This approach aligns with the guidelines from OMB, ICSP, and FCSM, facilitating dataset producers in providing comprehensive quality-related information and enabling dataset consumers to adequately assess the fitness for use of the data.
Approach meets the requirement. @mrratcliffe would be good to walk this and the documentation through with the statistical community to make sure the documentation is clear from the stats community perspective. Would like to hear the communities feedback before closing this issue.
I have completed the section for Data Quality usage guideline.
@TDabolt I'll share this with FCSM colleagues.
@fellahst I received the following comment and question from Jennifer Parker (Centers for Disease Control):
This [the recommended solution] looks good. I have one comment on Accuracy. Is there a way to incorporate statistical bias from, say, coverage or nonprobability data etc.? I realize that indicating this is difficult, more like relevance than variance measures. As we bring in more types of data, the bias issues can be bigger than the variance.
Should not be a problem as long you create a Metric for it that has a numeric value.
Michael Ratcliffe: US Census Bureau:
Requirement(s)
Incorporate Domains and Dimensions from the Federal Committee on Statistical Methodology's Data Quality Framework into the DCAT metadata standard. Provide fields for Data Quality Framework dimensions that can be filled by dataset producers, as appropriate.
Domains and component dimensions are:
Utility
Objectivity
Integrity
Problem Statement
Mandatory statement of the current situation, including: From the FCSM Data Quality Framework: "Effective understanding of data quality is essential for public ofcials, private businesses, and the public to make data-driven decisions. Data users who understand the ftness-for-use of data are more likely to use them appropriately, whether for secondary use in developing other data products, for conducting data analysis, or when using data outputs for decision making. Te Interagency Council on Statistical Policy (ICSP) has indicated that “agencies should work to adopt a common language and framework for reporting on the quality of data sets and derivative information they disseminate.”
In line with OMB and ICSP guidance, dataset producers/stakeholders are expected to provide information that reports on the various aspects of quality as detailed by the Data Quality Framework, as appropriate, to enable data consumers to adquately assess fitness for use, accessibility of the dataset and how to access.
References: Federal Committee on Statistical Methodology (FCSM), 2020. "A Framework for Data Quality," FCSM-20-04 available at https://www.fcsm.gov/assets/files/docs/FCSM.20.04_A_Framework_for_Data_Quality.pdf
ICSP. 2018. Principles for Modernizing Production of Federal Statistics, Available at https://nces.ed.gov/fcsm/pdf/Principles.pdf.
OMB. 2019. M-19-15: Improving Implementation of the Information Quality Act. April 2019. Available at: https://www.whitehouse.gov/wp-content/uploads/2019/04/M-19-15.pdf
Target Audience / Stakeholders
User 1. Dataset producers User 2. Dataset consumers User 3. Data Consumers/Producers in the Statistical Domain*
Intended Uses / Use Cases
Use 1: Dataset producers providing information related to data quality framework domains in compliance with OMB, ICSP, and FCSM guidelines Use 2: Data consumers assessing fitness for use Use 3: Search engines reading metadata in response to user search